Google 



This is a digital copy of a book that was preserved for generations on Hbrary shelves before it was carefully scanned by Google as part of a project 

to make the world's books discoverable online. 

It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject 

to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books 

are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover. 

Marks, notations and other maiginalia present in the original volume will appear in this file - a reminder of this book's long journey from the 

publisher to a library and finally to you. 

Usage guidelines 

Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the 
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we liave taken steps to 
prevent abuse by commercial parties, including placing technical restrictions on automated querying. 
We also ask that you: 

+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for 
personal, non-commercial purposes. 

+ Refrain fivm automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine 
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the 
use of public domain materials for these purposes and may be able to help. 

+ Maintain attributionTht GoogXt "watermark" you see on each file is essential for informing people about this project and helping them find 
additional materials through Google Book Search. Please do not remove it. 

+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just 
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other 
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of 
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner 
anywhere in the world. Copyright infringement liabili^ can be quite severe. 

About Google Book Search 

Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers 
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web 

at |http : //books . google . com/| 



'.* 






'•^D V 



FUNDAMENTALS OF 
EDUCATIONAL MEASUREMENT 



FUNDAMENTALS OP 
EDUCATIONAL MEASUREMENT 

WITH THE ELEMENTS OF 
STATISTICAL METHOD 



CHESTER ARTHUB GKEGOEY, Ph.D. 




D. APPLETON AND COMPANY 

NEW YORK LONDON 

1922 



GOPTRIGHT, 1922, BT 

D. APPLETON AND COMPANY 



360989 



/ 



• • * * 

. • : .- - . . • . • , •.r • • .i ..' 



PUMTXD nr THv umrxD statu of amxuca 



PREFACE 

This book is an attempt to bridge, in part, a gap be- 
tween theory and practice in educational tests and measure- 
ments. The leaders in the movement for a quantitative 
study of educational problems have advanced so far into 
the theory and more complex practical phases of the work 
that a very large percentage of the teachers and students 
have considerable difficulty in following them. Most of 
the books on the subject have either been more or less 
technical, pre-supposing considerable training on the part 
of the readers in the field, or they have been largdy 
manuals of directions for giving the tests and scoring the 
papers, with little reference to the theory and to the prob- 
lems which have confronted those attempting to measure 
educational processes and products. This book deals with 
the processes and problems in a somewhat evolutionary 
way so that the teachers and students may see the order 
in which the problems have arisen and the attempted solu- 
tions of them. A mere manual of directions for giving 
tests and scoring the papers will not develop a professional 
spirit among teachers in this field. They must under- 
stand the fundamental principles, or the work becomes 
purely mechanical and non-professional. It has been the 
aim of the author to present the fundamental principles 
in non-technical language, as far as it is possible to do so, 
and to confine the statistical treatment of the data almost 
entirely to simple operations in arithmetic. 

The author has drawn freely on the works of Drs. 
Edward L. Thomdike, Leonard P. Ayres, Harold 0. Rugg, 



vi PREFACE 

W. C. McCall, Lewis M. Terman, and others to which he 
desires to give due recognition. 

The author desires also to thank Prof. H. E. Douglass 
and Dr. B. W. DeBusk for reading parts of the manuscript 
and especially to thank Professor W. C. Mclnnis of the 
Eugene High School for the careful reading of the manu- 
script and the helpful su^estions made. 

C. A. G. 



CONTENTS 

Fi 

PttEVACE V 

CHAFTEB 

I. Inteoduction 1 

Efficient American Citizens 3 

Progress Conditioned by Ability to Measure . • 4 

Working Hjrpothesis for Acquiring Efficiency , 8 

II. Efficient Through Measurements .... 10 

Statement of and Adherence to Definite Aims . 10 

Value of General Aims Limited ... 11 

A Danger to Be Avoided 12 

The Technical Scientist and Educational Creeds 12 

The Elimination of Waste 14 

Limited Quantities of Things Make Measure- 
ments Necessary 14 

Waste Not Peculiar to American Schools . 14 

Human Energy Must Be Conserved ... 15 

Opportunities for Waste in Education . . 15 
The Business Man Eliminates Waste through 
Accurate Knowledge of His Processes and 

Products 16 

Wastes in Education Many and Varied . . 17 
Four Economies in Education .... 18 
Placing Education on a Factual Basis Through Edu- 
cational Measurements 20 

Two Kinds of Opinions and Their Uses . . 21 

Opinions Worthless in the Face of Facts . 22 

Inductive Methods Not Used in Pedagogy . 24 
Lines of Demarcation between Knowable Facts 
and Philosophical Opinions Must Be Sharply 

Drawn 24 



•• 



vu 



?1 



Data on Simplest School Room Procedure Lack- 
ing 25 

Cultural Development Supplemented by Mechan- 
ical Phases of Education 26 

No Excuse for Mere Opinions Where Facts Are 

Available 28 

BelatioQ of Time Consumed to the Finished 

Product 28 

The Public Is Asking for a Ledger Account of 

Oar Business 31 

Inability to Show Facts Has Worked a Hard- 
ship ou the Teaching Profession ... 32 
The Establishment of Definite Standards ... 34 
Three Kinds of Standards Needed ... 36 

Standards as to Quantity 36 

Standards aa to Time 38 

Standards as to Quality 39 

Standards Changing 39 

Profiting by the Experience of Those in Other Occu- 
pations and Professions 39 

The Manufacturer Has Specifications ... 41 
Meeting Some Objections to Educational Testa and 
Measurements 43 

1. Tests Will Not Endure 44 

2. Child Mind too Complicated to Measure , 44 

3. The Judgment of & Competent Man Better 
than Scales 45 

4. Tests Tend to Reduce All Educational Work 

to a Dead Level with no Allowance for Indi- 
vidual Differences 45 

5. Tests Measure so Small a Part of Intellectual 

Life that They Are Not Indicative of Gen- 
eral Ability 47 

Keeping an Accurate Record of All Methods Tried 

and Progress Made 49 

The Cultivation of the Confidence and the Utilization 
of the Support of the Public 49 



J 



CONTENTS ix 

CBAFTEB PAGE 

Teachers Must Know Why Sweeping Changes 

Are Made 49 

The Public Is Interested in Education . . 51 

Fields of Educational Tests and Measurements . 53 

III. The Measurement of Intelligence .... 56 

General Statement of the Problem .... 56 

Why Work in Psychological Measurements Was Re- 
tarded . . . . ... . . .60 

Effects of Wundtian Laboratories .... 60 

What Is General Intelligence? 63 

Is Intelligence a General Faculty of the Mind? . 65 
Inability to Define Intelligence Accurately Does Not 

Prohibit Measurements 67 

The Binet Tests 67 

A Tabular Synopsis of the Binet Scale, 1911 Edition 68 

The Binet Tests Had Many Innovations ... 72 
The Constituent Functions of Intelligence Must Be 

Brought into Play 72 

The Kind of Mental Functions Brought Into Play 74 
Establishment of a Zone of Normality ... 75 
Criteria for Separating the Normal from the Sub- 
normal 79 

Are Differences in Intelligence One of Degree or of 

Kind? . 82 

Choosing Tests to Measure Intelligence ... 83 
Tests Must Not Be Influenced by External and Chance 

Conditions 84 

Only Those Tests Bust Be Chosen That Afford a 
Decided and Reliably Symptomatic Value, General 

Applicability and Possibility of Objective Evolution 85 
Tests Must Not Depend Too Much on the Ability to 

Use Language 85 

Determining the Age to Which a Test Should Be 

Assigned 86 

Problems in Scoring 87 




The All-or-none Method in Scoring . . , S8 
Shall a Child Be Required to Pass All the Teste 

at Each Age Level? 88 

With What Teats Shail the Examination Begin? 89 
The "At Age" and the Normal Child ... 90 
Binet Tests Give More Than a Composite Picture 90 
The Coefficient oi Mental Age and tbe Intelligence 

Quotient 90 

Limitations of the Tests 91 

The Age of Mental Maturity 91 

Criticisms of the Binet Tests 93 

Limits of Traits Not Determined by the Binet Scale 94 
The Bmet Scale Criticised on Other Points . 05 
Some Favorable Criticisms of the Binet Scale . 99 
Other Problems Confronting Those Testing Intel- 
ligence 100 

Does the Defective Progress Normally to a Cer- 
tain Point and Then Suffer Arrest or Has 
Mental Development Been Retarded from 

Birth? 100 

Does the Defective Child Have the Same Mental 
Equipment as the Normal Child of the Same 

Mental Age^ 101 

Do Feeble-Minded Children Matare Mentally at 
the Same Chronological Age aa Normal Chil- 
dren? 101 

Are Siihnorraa! Children Equally Deficient in all 
Abilities? 101 

IV. The Measubbment ov Intelligence — Coniinued . 104 
The Extension and Revision of the Binet Scale and 

Other Measures of Intelligence .... 104 
The Stanford Revision of the Binet Scale . . 104 
To Find the Mental Age of a ChUd by the Stan- 
ford Revision 106 

The Pictare Completion Tests 107 

The Form-board for Measuring Intelligence . . 108 



CONTENTS 3d 

CHAPTER PAGE 

A Scale of Performance Tests 109 

A Point Scale for Measuring Mental Ability . . Ill 
Proposed Reorganization of the Binet Scale by Meu- 

mann 117 

Endowment or Intelligence Tests Proper 117 

Tests of Development in the Narrower Sense . 118 

Tests of Environment 118 

Wallin's Criticisms 118 

Other Tests Devised to Measure Intelligence . . 119 

Group Intelligence Tests 122 

Principles Involved in the Selection of Group Tests 123 
Requirements of Group Tests Are Many . . . 124 
The Terman Group Tests of Mental Ability . . 125 

The National Intelligence Tests 127 

The Haggerty Intelligence Examinations . . . 129 
The Otis Group Intelligence Scale .... 131 
The Dearborn Group Intelligence Tests . . . 132 

Uses of Intelligence Tests 132 

Summary and Evaluation of the Measurements of 

Intelligence 136 

The Methods Are Yet Crude 137 

V. The Need for Definite Measurements op School 

Achievements 147 

Time Consumed in Giving Examinations . . . 147 
Attitude of Pupils and Teachers Toward Examina- 
tions 148 

The Marking System Now in Vogue .... 150 
Scientific Measurement of School Achievements is 

new 152 

Those in the Profession Must Take the Initiative for 

Improvement . . 153 

Purposes of Educational Tests Are Not Generally 

Understood 154 

The Problem to Be Solved 154 

What School Achievement Tests Measure . . 155 



lii CONTENTS 

OKAPTSK PIGE 

Experimental Evidence to Show that Sahool Marks 

Are Inodequnte 156 

Harking System Inefficient Because it Does Not Indi- 
cate Progressive Degrees of Merit .... 158 

Definition of a Scale 159 

A Valid Scale Must Have Equal Steps and Each 

Step Benr a Definite Relation to the Zero point . 161 
More Exact Measurements Will Make Educatiou a 

Science 164 

Supervision Improved as Ability to Measure Increases 166 
Our Educational Scales Have Been Subjective . 167 
Testa Do Not Indicate the Cause of Conditions , 168 
How Standard Tests Differ from Ordinary Examina- 
tions 169 

IHustrHting the Law of the Single Variable . . 172 
Time of Day When a Test Should Be Given . . 174 
Number of Times a Test Should Be Given . . 176 
How Standardized Testa are Helpful in Improving 

Instmction 177 

^Tmt Kind of School Achievement Teste is Most 
Importautt 178 

VI. The CLASStPiCATioN of School Achievement Tests 

ASD THE PcmJAMKSTAL PRINCIPLES POK DESIGN- 
IKQ ThEU 

DiagDostie vs. General Tests 
Degree to vbieh Tests are Diagnostic 
Formal Tests and Reasoning Tests 
Rate Tests snd Deu'elopment Tests 
Quality, DilBcult?-. and Time or Amoont Tests . 187 
MeasareiDcnts by Opinion, Compaiison and by Stand- 
ard Tests 189 

Clafsificntiou by Edurationsl Tests .... 190 
Eduratioiud A|^ and Mental Age Compared . . 192 
AcMMBpUshiocnt and EdneatioiM] Qtiotimts . 193 
Ptianples for tbe Cttoice of Snbjeet-Matter for Edu- 
cational Tests and Scales 191 



CONTENTS xiu 

CHAPTER PAQB 

Information Desired Determines the Type Subject- 
Matter 196 

Some Characteristics of an Ideal Educational Scale 196 
The Establishment of a Zero Point . . . 196 
Making the Steps of Equal Magnitude . . 202 
The Scale Must Measure the Desired Educa- 
tional Product 211 

The Test Must Be so Simple in Its Application 
That It Is Adapted to the Classroom . . 211 
Tests Must Not Require an Undue Amount of Time 
in Administration 212 

VII. Scoring the Tests and the Treatment of the 

Measures 214 

The Problems of Scoring 214 

Example in the Development of a Geography Test 214 
State Examination Questions Examined . . . 216 

The Division Chosen 216 

The Selection of Cities to be Used in the Test . 217 
Advantages of Tests Thus Designed .... 221 
Time is Saved for the Pupil .... 221 
Personal Equation is Eliminated . . . 221 
Much Time is Saved in Scoring .... 221 
More Ground May be Covered by a Test Thus 

Designed . 221 

Pupils Will Review the Whole Field in Pre- 
paring for the Examination .... 222 

The Determination of Scores 222 

Where There is Choice Among More Than Two 

Answers 223 

Some Objections to Tests of This Kind . . . 224 
Effect of Incorrect Statements Being Placed Before 

the Student 225 

The Values to be Assigned to Scores . . . 226 

General Problem of Weighting Scores . . . 227 

By the Teacher's Judgment 228 



•v' 



»v CONTENTS 

CHAPTBB PAGE 

By Weighting the Parts According to the Dis- 
tribution o£ Abilities as Shown by the Normal 

Frequency Curve 228 

Aeeuinulation Scores and Scores of Greatest Diffi- 
culty 229 

Vni. The Meastikement op EDncdTiONAL Processes and 

Proddcts IK Fiv-E Fields of Pdblic School Woek 232 

Measures of the Materials of Instruction . . 233 

Determination of a Spelling Vocabulary . . 233 

A Study of Beading and Spelling Vocabularies 

of the Books Used in the First Three Grades 237 

The Problem 237 

Terms Used in the Study 238 

The Contents of Three American Histories . . 240 
The Measurements of the Physical Growth of School 

Children 249 

The Measurements of the Money Cost of Education 251 
The Measurements of School Buildings . . . 251 
The Measurements of Retardation, Acceleration, and 
Elimination 253 

IX. Educational Statistics, General Statement . 260 
The Use of Statistics in Other Fields . . .261 

The Question of Error 262 

Distribution of Measures About a Point of Central 

Tendency 262 

Educational Measurements Compared With Measure- 
ments in Other Fields 264 

Qoantities Measured Indirectly 266 

Definition of Statistics 267 

Laws of Statistical R«gularity 268 

Methods of Statistics 269 

Limitations of Statistics 270 

Standard of Accnracy 270 

Compensating vs. Cumulative Errors .... 270 

Discrete and Continuous Series 271 

Undistributed Measures 271 



CONTENTS XV 

CSAFTEB PAGB 

Boles for Tabulating Data 272 

General Directions for Making a Scale and Curve- 
Plotting 273 

Designating the Class Intervals 274 

Analysis of Results 275 

The Need for Understanding Statistical Formulas 276 

X. The Measurement of Central Tendency^ or Aver- 
ages 278 

Averages 279 

The Arithmetic Mean 279 

Computation of the Arithmetic Mean by the Short 

Method 283 

Summary of Steps in the Computation of the Arith- 
metic Mean by the Short Method .... 286 

The Mode 286 

The Median 287 

The Spread of the Score Interval Commonly 

Used in Statistics and School Practice . . 289 
What Formula Shall Be Used in Computing the 

Median? 290 

Computation of the Median, Simple Distribution . 291 
Case 1. The Number of Items in the Distribution 

Is Odd 291 

Case 2. The Number of Items or Scores Is Even 292 
When the Distribution Is Complex .... 293 
Case 3. Where More Than One Pupil Make the 
Same Score and the Data Are 
Grouped in Class Intervals . . 293 
Case 4. Where the Median Falls in the 100 or 

the Zero Class Interval . . . 297 
Case 5. When the Partial Sum is the Half Sum 

and There Is no Correction . . 298 
Case 6. Where the Measures Are Discrete . 299 
Case 7. Where the Median Falls in the Class 

Interval Containing no Cases . . 300 



xvi CONTEXTS 

CHARSB PlfiS 

Summary of Steps in the Compctarion of the Median 301 

Comparison of the Arithmetic llean. Median, and 
Mode 301 

Qnartiles and Percentiles 303 

Measubeme^tts 07 DrspERSioy. ob Vasiastlitt . 304 

How Variabilitv Is Measured 305 

The Measures of Absolute VariabLliry . . . 305 

Computation of Mean Deviation 308 

Computation of Mean Poviation: Para Grouped in 
a Frequency Distribution 310 

Summary of Steps in the Computation of the Mean 
Deviation by the Shon Method .... 314 

The Computation of Standard Deviation . . . 315 
The Computation of Standard Peviatiou by the Short 

Method 316 

Summary of Steps in the Computation of Standard 

De^-iation bv the Short Method .... 316 
The Coefiioient of Variability 321 

The Measubemext op Kel.vt:onsh:p. ob Cobbela- 
TiON' 324 

Need for Measures of Kolationship .... 324 

Illustrating the Computation of the Coefficient of 
Correlation: Data Simple and Ungrouped - . 328 

Illustrating the Computation of the Coefficient of 
Correlation: Data Complex and Grouped in Class 
Intervals 331 

Summaiy of Steps in the Computation of the Co- 
effident of Coirdation 335 

ninstnting Om Compatation of the Coefficient of 
CoRdAlioii by the Slnit Method (Adapted from 
A7n»> 337 

Bfv "^ Oonrdation Betweoi Two 

fhod . • . • 344 



CONTENTS xvii 

C9BAFTEB PAQB 

Many Pairs of Values With Data Grouped in Class 
Intervals 346 

Finding the Equation of a Straight Line of Regression 351 

Pearson's Equation for a Line of Regression . . 353 

The Reliability of the Correlation Coefficient . . 360 

Spearman's Method of Rank Correlation . . . 362 

Table of Squares and Square Roots 367 

Index 377 




FUNDAiVtENTALS OF 
EDUCATIONAL MEASUREMENT 

CHAPTER T. 

INTRODOCTION 

The crying need of the hour is for efficiency. . The fact 
that there are few universal standards for efQ^iedcy does 
not dampen the ardor of those demanding it. The commer- 
dal world will tolerate almost anything but what it con- 
ffiders inefficiency; the eEBeiency, or inefficiency, of our 
railway systems constitute no small part of the conversation 
of the shippers and the traveling public, in politics we 
talk about an efficient administration or an efficient public 
Hrvant; in education, the public wants an efficient school 



The subject of efficiency has occupied the center of the 
Stage at every noted gathering of educators in the United 
States for the last decade. A belief is current that the 
idncation of the day is far short of its optimum efficiency; 
that there is something wrong with the public schools. 
Because of this criticism the elementary schools have been 
compelled to face, not, as of old, a criticism as to their 
rights to existence or their right to receive public support, 

at an adverse criticism as to their efficiency as measured 

<f their human product. 

The educator can withstand almost any kind of criticism 
but that which decries the efficiency of his school. 




2 EDUCATIONAL MEASUREMENT 

ciency is one of the chief goals for which he is striving. 
It is the balance in which his success or failure must be 
weighed. Books, consisting of hundreds of pages each, 
have, as their chief contents, data dealing with the efficient 
administration of the various phases of school systems. 

Recently there has come froia the press a five-hundred 
page report on the Gary public schools made by the General 
Education Board. The makers of this survey say in the 
opening pages of their report:^ "The public is interested 
in knowing whether the Gary schools are efBcient or in- 
efficient as now conducted. The public is also interested 
in knowing whether such a plan is sound or unsound. The 
present study ..tries to do justice to both points." 

The National Society for the Study of Education con- 
sidered eSfciency of such importance that it devoted Part 
II otf^B "fourteenth Yearbook to the "Methods of Measur- 
ing Teachers' Efficiency." Scores of other books on educa- 
tion make efficiency no small part of their content. 

The efficiency programme is not confined to the public 
schools. Education is but one phase of the newer social 
economy. When one runs through the card index of the 
library or consults the Reader's Guide, Tie is surprised to 
see the number of books and magazine articles dealing with 
the subject of efficiency in aU lines of work. The states 
of the Union appoint commissions on efficiency and econ- 
omy. Economy and efficiency in government service be- 
comes a part of the report of the President of the United 
States to Congress. Books on efficiency are written under 
such titles as the following : "Modem Methods in the Office, 
How to Cut Comers and Save Money"; "Economics of 
Efficiency"; "Psychology for Business Efficiency"; "Effi- 
ciency as a Basis for Operation and "Wages"; "Motion 

1 Stuart A. Courtis, The Oary Public Schoola. Measurement of Class- 
room Producis (General Education Board, 61 Broadway, New York 
City), Introduction, p. 8. 



INTRODUCTION 3 

Stndy, A Method for Increasing the Efficiency of the 
Workman"; "Fatigue and Efficiency"; "The Price of 
inefficiency"; "The Human Machine and Industrial Effi- 
ciency"; "A Symposium on Scientific Management and 
Efficiency in CoUege Administration"; and hundreds of 
Other books and articles dealing with this subject. 

The spirit of the new education is that of social efficiency. 
Sabjects wiU no longer find an excuse for being in the 
course of study in the words of the old song, "We're here 
Bcause we're here," but each subject must prove that it 
; one of the best things that can be taught in the few 
Bhort years of a child's school life. 

The mechanic has a perfectly definite way of measuring 
BBeiency. It is the ratio that the useful work done by a 
machine bears to the total work done on it. No machine 
i one hundred per cent effieient, else a perpetual-motion 
machine would be possible. There is always a waste. The 
most efficient machine is the one wliich does the maximum 
amount of worli with a minimum of waste. 

The efficiency of a school system is likewise a ratio. 
It is the ratio that the time, energy, and money spent 
bear to the finished product. The finished product in the 
«ase of the machine, that is, the work done, is measured in 
some convenient unit of work, as the foot-pound, the 
loot-poundal, the erg, or the kilogrammeter. Unfortu- 
nately, the products of a school system are measurable 
in no such definite terms. Descriptive adjectives with 
Tagne meanings usually constitute the best and only 
Ineasuring sticks we have. 

t American Citizens. — It is said that one of the 
fhief businesses of the American schools is to prepare in- 
dividuals for efficient membcrahip in American society. 
'There are at least three qualifications for such raember- 
Bhip: (1) An ability to execute effectively the formal and 
Jslormal duties of citizenship and carry the burdens of 



4 EDUCATIONAL MEASUREMENT 

political responsibility; (2) an ability to labor and produce 
so effectively that one is able to carry his own economic 
load; (3) an ability to utilize one's leisure time and to 
function in. an individual capacity without infringing on 
the rights of others or of society at large.^ 

Pew would disagree with these qualifications. It is only 
when we attempt to particularize and say just what and 
how much au individual must have to execute the formal 
and informal duties of citizenship, carry the economic load 
and perform the other duties mentioned above, that dif- 
ferences of opinion arise. 

AU agree that the schools should turn oat boys and 
girls efficient in aritlimetic, language, civics, good character, 
and all the rest, but there is little agreement as to just 
what one must do to become efficient in these things. 
Standards of efficiency and methods of reaching them have 
not always been based on scientific facts which might be 
verified in the laboratory. Before one can have an intel- 
ligent conception of efficiency, there must be some way 
to evaluate, to measure it. In fact, one cannot conceive of 
efficiency except in terms of quantity. 

It is the purpose of this treatise to set forth in a non- 
technical way, as far as possible, some of the benefits to 
be derived and the difficulties to be overcome in the use 
of measurements in the field of education. 

Progress Conditioned by Ability to Measnre. — Progress 
in civilization has depended very largely on our ability to 
measure. James Watt, for instance, could not make a 
steam engine until men were able to make measurements 
80 exact that a cylinder and a piston could be built that 
were steam-tight and yet allowed free play. The automo- 
bile of to-day had to wait until men could measure the 
five-thousandth part of an inch, and the ship's chronom- 

^ Alexander Inglis, Principles aj Secondary Bducatitm, p. 342. 



INTRODUCTION 5 

eter until they could meaaure distance five times more 
minute than that. 

New measures are constantly coming into use. They are 
no longer restricted to length, area, weight, and volume. 
New commodities such as electric currents, light, heat, and 
refrigeration are measured and sold on the maricets to-day 
just as other commodities are sold. Progress in these lines 
has been conditioned by our ability to devise new measur- 
ing sticks to measure them. 

The physical sciences have taken every precaution to 
standardize their units of measurement. At the United 
States Bureau of Standards in Washington one finds 
instruments that will weigh with accuracy down to 
an ounce ; high-temperature thermometers 
that will register accurately 1,000 degrees above zero 
Fahrenheit and pentane thermometers registering 300 de- 
grees Taelow zero Fahrenheit ; saucharimeters measuring the 
impurities in sugar by the twist of light waves w^hich are 
allowed to pass through a sugar solution; Emery testing 
machines having a compressing power of 2,300,000 pounds 
and a pulling power of 1,150,000 pounds per square inch.' 
no longer a matter of opinion as to the strength 
of a steel girder, for instance, or the resistive power of a 
Bteel rail. The scientist now speaks with authority along 
Natural science has made its gains by sub- 
stituting facts for opinions, and units of measure for mere 
guess-work. Hershel says: "Numerical precision is the 
Boul of science." 

When the teaching profession enters the stage where its 
dlata and conclusions can be presented in quantitative as 
well as qualitative terms, it has entered upon a most im- 
portant stage of development. 

Men in all businesses and professions are becoming quan- 



1 



6 ' EDUCATIONAL MEASUREMENT 

titative thinkers. Education is not an exception. Edu- 
cators arc seeking to verify and, in some cases, to refate 
the established beliefs concerning the effects of educational 
forces upon human nature. Dogmatic and authoritative 
control of education is going tlie way of all mere authority 
and dogma in human affairs. The "popular guessing con- 
tests" that have been going on in education as to which 
processes are the best, and what products are obtained from 
them, are giving way to experimentally determined facts. 
It means tliat education is emerging from among the voca- 
tions and taking its place among the professions. 

In the evolution of our civilization, one form of human 
activity after another has been subjected to exact measure- 
ment and made to yield its quota of natural law. This 
movement in education is but a part of a larger one which 
in recent years has extended the scope of the applied 
sciences and utilized scientific principles in the imI)rove- 
ment of many lines of human endeavor. 

The American people are an extremely practical folk. 
Contemporaneous American life manifests an abiding faith 
in practicality and efficiency. The American people believe 
in a creed or philosophy of life that really "works" when 
applied to practical situations. They are pragmatists. 
They are looking away from first things, principles, cate- 
gories, supposed necessities, and looking towards last 
things, fruits, consequences, facts.* "We have talked much 
of the aims of education. Now we are asking for results. 
We are going to judge the efficiency of a school system in 
terms of the results and not in terms of its aims. 

For many years we have measured, in a way, the effi- 
ciency of higher education by the definition of what con- 
stitutes an institution of higher learning, which we find 
in the laws of the various states, in the definitions of the 



• Wiliiam James, PragmaUsm, p. 54. 



INTRODUCTION 7 

TJ. S. Bareau of Education, the Carnegie Foundation for 
the Advancement of Teaching, and many voluntary associa- 
tions that deal with this phase of education. These defini- 
tions cover such topics as: entrance requirements, endow- 
ments, income, curricula, the school plant, time allotments, 
and the qualifications of the teaching staff. Suoh standards 
measure the equipment and materials for instruction but 
not the product, 

Much emphjisis has been laid on the statement that the 
teacher's power is exerted in one generation, while that 
of his students is exerted in another ; that for liim who 
teaches there is no final measure of the day's work; that 
it is beyond his vision in time and place. The next genera- 
tion may attempt a full estimate of his labor, but he him- 
self cannot. 

Teachers have accepted these general statements more 
literally than the facts warrant. It is true that the full 
power of one mind over another cannot be measured. 
Nevertheless, if the intricate ministry of teaching is to 
become anything more than a crude art, where blind faith, 
subtle intuition, and crude methods of trial and error, are 
the rules of procedure, standards must be set up and meas- 
urements must he made, not in the next generation, but 
in the midst of the educational processes now going on. 

The ideal condition would be for the teacher to see the 
man of hia moulding walking about full-grown among his 
neighbors and performing his daily duties and graces. No 
other measure of one's work equals the sight of the product 
put to its full uses. It is the best corrective of one's 
blunders, the quickest encouragement to efficient action. 
Unfortunately, this satisfaction is reserved for the lesser 
craftsmen of life. "We shall have to be content with condi- 
tions less favorable. This does not mean, however, that 
because we cannot measure our educational products com- 
pletely, we cannot measure them at all. Some phases of 



8 EDUCATIONAL MEASUREMENT 

our educational processes and prodocts are quite amenable 
to measurements. The hazard is too great to wait for the 
next generatioQ to evaluate our products. Definite, attain- 
able goals, therefore, are being set up. Teachers are being 
required to find the situation, or series of situations, that 
will produce the desired results. When a child is put 
through the education situation, we want to know imme- 
diately the quality and amount of change made. If the 
results are not satisfactory, we put him through the situa- 
tion again or make a new situation that will produce the 
change desired. 

Working Hypotheses for Acquiring Efficiency. — No at- 
tempt will be made here to define an efficient school system. 
Enough has been said about efficiency to give the general 
direction of the goal toward which we are striving. No 
attempt will be made to discuss educational creeds unless 
measurements become a factor in determining them. The 
chief problem, rather, is this; having decided upon a de- 
sired result, how may measurements be utilized to bring 
about that result with the minimum expenditure of time, 
money, and energy? Our hazy ideas about an efficient 
school system will be clarified only by approaching a little 
closer to the goal sought. While we do not know exactly 
what an efficient school system is, we know the road that 
leads to it. 

Whatever the definition of an efficient school system may 
be, the following propositions may be readily accepted as 
working hypotheses: 

1. Of two school systems attempting to make efficient 
'American citizens, that system is the better which produces 
the desired finished prod/uct with the mintTmim waste in 
time, energy, and money expended. 

2. More accurate measures than we now have of those 
things that are measurable will reduce the tvasie in educa- 
tion and make our schools more effident. 



1 



INTRODUCTION 9 

It is with the second proposition that we are primarily 
concerned. An attempt will be made to show how the 
efficiency of the schools may be increased by a more ac- 
curate measurement of certain materials, processes, and 
products. Three activities are necessary to do this. It is 
necessary: (1) to persuade the teachers that measurements 
are beneficial; (2) to make the measurements; (3) to use 
the results of the measurements as a means for doing better. 

Kiiowledge is the first attribute of the man of science. 
It is knowledge, bom of the travail of thought and ex- 
perience, that differentiates the physician from the quack ; 
the lawyer from the shyster; the statesman from the 
demagogue; it is likewise the first indispensable element 
of educational sanity and progress. Education has hitherto 
rested on the foundation of custom. It must hereafter rest 
on a basis of scientific knowledge. Facts alone will enable 
the educator to keep his balance in the midst of educational 
upheavals. Because of the lack of scientific information, 
many theories not justified by systematic observation have 
been current. As a result, much of the time and energy 
of teachers have been spent to a great disadvantage; con- 
fusion has been produced ; and the teaching profession has, 
at times, been greatly retarded. 

In spite of the fact that the educational literature is 
replete with data and arguments why teachers should use 
measurements to eliminate waste and do more effective 
teaching, nevertheless the majority of teachers are not 
availing themselves of this opportunity. Tradition and 
habit are still too strong to allow any radical changes. Of 
those teachers who have used these newly acquired tools, 
many have treated them as ''educational curiosities'' rather 
than as a means for more efficient work. It is because of 
these conditions that considerable space is given in the next 
chapter to reasons for the need of measurements. 




EFFICIENCy THROUGH MEASUREMENTS 



In this chapter we shall discTiss some reasona why meas- 
urements should be made, hoping thereby to persuade a 
greater number of teachers that measurements are necee- 
sary for the most effective teaching. The subject will be 
discussed under the following headings, all of which have 
to do with measurements either directly or indirectly. The 
general thesis is that the schools may be made more efficient 
by: (1) statement of and adherence to definite aims; (2) 
the elimination of waste; (3) placing education on a fact 
basis through educational measurements ; (4) the establish- 
ment of definite standards; (5} profiting by the experience 
of those in other occupations and professions; (6) meeting 
some objections to educational tests and measnrements ; 
(7) keeping an accurate record of all methods tried and 
progress made; (8) the cultivation of the confidence and 
the utilization of the support of the public. 

I. STATEMESfT OP AND ADHERENCE TO DEFINITE AlMS 

Professor Bobbitt has tersely stated the situation in re- 
gard to our aims in education. He says :^ ' ' We have aimed 
at a vague culture, an ill-defined discipline, a nebulous, 
harmonious development of the individual, an indeiinite 
moral character-building, an nnpartieularized social effi- 
ciency, or, often enough, nothing more than escape from 
a life of work." 



Mifflin Co.), 



EFFICIENCY THROUGH MEASUREMENTS 11 

It is evident that, as long aa we deal witli such rainbow 
I generalities, we can never hope to get definite results. Our 
I results never will be more definite than our aims. One 
I of the first things that the establishment of standard norma 
I will do for the science of education will be to make definite 
[ and specific the aims of teaching. 

We have a bountiful supply of general aims of more or 
less value. Education is a training for citizenship; it ii 
to prepare for life; it Is to develop power; it is to make 
cultured men and women; it is to prepare for the voca- 
, tions; it is to develop a moral individual, endowed with 
I the power of independent thought, with the ability to earn 
. an honest livelihood, with culture, refinement, and a broad 
[ and intelligent interest in human affairs. The aims of 
I education are sometimes stated in terms of culture and 
[ sometimes in terms of productive efficiency. The tendency 
, now most in favor among progressive school men is that 
I emphasized by the very principle for which social economy 
' stands; that education is to be tested neither by culture 
, in the abstract, nor utility in the concrete, but by the 
[ extent to which any slight knowledge, any manual dex- 
1 terity, and any useful "tricks" of spelling, counting, read- 
[ ing, etc., are assimilated into the organic complex of per- 
[ sonal character. 

Value of Ckneral Aims Limited. — It is easy to agree 
I with general aims but (hey are of limited value in the 
I Bchool room. Where one is confronted with a particular 
L boy, with a particular task, as adding a ten-figure column, 
[.such aims as preparation for life, citizenship, culture, 
J power, etc., seem remote and ineffectual. The controlling 
I purpose of education must be sufficiently particularized 
Pthat we may know when any part or unit of the aim has 
I 'been accomplished. An objective such as "good citizen-'' 
Lship," for instance, must first be reduced to smaller units / 
^'Tueh as being a good neighbor, a good parent, a wise and 



/, 



i 



1 



12 EDUCATIONAL MEASUREMENT 

conscientious voter, etc. These units must, in turn, become 
material for further analysis until we iinally get down to 
definite "working units." The first thing we have to 
settle is what we want our schools to produce. Our courses 
of study at the present time easily could be filled with many 
objective statements of aim which would be bo specific that 
there would be no question whether or not they were car- 
ried out in practice. 

When we are challenged to justify the increasing coat 
of education and the multiplication of courses necessary 
to meet the demands of complex life, we apply the prag- 
matic test. For instance, what practical difference would 
it make if this were incorporated into the course, and if 
that were left out? The progressive teacher is constantly 
confronted with the decision .of accepting an educational 
fad for a scientific fact. Measurements offer a means by 
which she can test out the relative values of aims by tracing 
them to their consequences. Tests and measurements, there- 
fore, become an important guide in evaluating the pro- 
posals of educational changes. 

A Danger to Be Avoided. — There are many dangers 
incident to the use of measurements which will be discussed 
later in this chapter. Only one will be mentioned here, 
namely, the danger of malting the real aims and ends of 
education subservient to measurements, instead of using 
measurements to get the desired results. The good physi- 
cian is interested primarily, not in the medicine, nor in 
the methods of administering it, hut in the effect it will 
have on the patient, and we should regard him as a poor 
physician who thinks more of the medicine and the way 
he gives it than he does of the patient's progress toward 
recovery. 

The Technical Scientist and Educational Creeds. — It is 
not the business of the technical scientist to tell us what 
shall be the aims of education. That is a matter of creed. 



I Those belonging to a creed usually set up an ideal toward 
which they strive. For instance, there are those in educa- 
tion who look primarily to the subjective results; the en- 
riched mind; refined sensibilities, culture, discipline, and 
the like. Those belonging to this creed say that education 
is the ability to live rather than the practical abiUty to 
produce. 

Another creed has as its aim efficient practical action in 
a practical world. The individual js educated who can 
perform efficiently the labors of his calling; who can co- 
operate effectively with his fellow in social civic affairs. 
This creed would have science studies in order that the 
facts may be put to work by the mechanic in his shop, 
and the farmer on his farm. It would make a survey of 
the science needs of the community and teach only those 
things which make a direct contribution to these needs,* 
The scientist in education has practically nothing to do 
with the formation of these creeds. He simply says that 
yon must make use of this means, if you wish to reach 
this or that particular end. Bat no technical science can 
decide within its limits whether the end itself is really a 
desirable one. The technical specialist knows how he ought 
to build a bridge, or how he ought to dig a tunnel, pre^ 
supposing that the bridge and the* tunnel are desirable. 
Whether they are desirable or not is a question that does 
not concern him. He simply says that if you wish this 
end, then you must proceed this way. 

"When aims are definite, measurements are made on the 
materials of instruction so as to meet the specifications set 
forth in the aim. Everything that does not contribute 
to the aim is then discarded as material that is snper- 
floous. 



EFFICIENCY THROUGH MEASUREMENTS 13 



I flootis. 

^M * Ibid., pp. 3-4. 



14 EDUCATIONAL MEASUREMENT 

11. The Eumination of Waste 

Limited Quantities of Things Make Measurements 

Necessary. — If everything with which human activity is 
in any way concerned were unlimited, there would be no 
need for measurements. Even if things were not actually 
unlimited, if there was always enough of any one of them 
to he had with little or no expenditure of energy, we 
would concern ourselves but little about quantity, and 
hence measurements would be practically unnecessary. 
The very fact that most of the things with which we deal 
are limited, forces us to measure in order to best conserve 
them. A cash register is used because money is limited 
and therefore must be measured in order to conserve it. 
Nature has limited the time allotted to man to do things, 
and, because of this fact, he has huilt labor-saving ma- 
chines with which distances are annihilated and work is 
done as rapidly as possible so that time may be saved. 
Because our available supply of energy is limited, we must 
conserve each unit to be utilized in doing the world's 
work. 

One of the fundamental principles to be kept in mind 
in the solution of educational problems is that everything 
with which the educator works — time, energy, money, 
apparatus, resources of all kinds — are so limited that trust- 
ing them, in the hands of the ignorant cannot be endorsed, 
and prodigality with them is crime. 

Waste Not Peculiar to American Schools. — Americans 
are the most wasteful people in the world. Our bountiful 
natural resources, our great expanse of land, our seem- 
ingly exhaustless supply of those material things that 
supply human wants have made us wasteful. Suppose, by 
onr unscientific methods in farming, we did deplete the 
soil, was not there plenty of new land awaiting us ? What 
did it matter if millions of dollars' worth of natural gas 



EFFICIENCY THROUGH MEASUREMENTS 15 

was allowed to go to waste in obtaining oil so long as the 
oil fields yielded a profit? Thousands of tons of fish have 
been caught and wasted because men have failed to see 
that the demand might some day overtake the supply. 
Timber in our great forests has been ruthlessly destroyed 
because a short-sighted public did not stop to eonsider that 
the end was already in sight. Now that the new lands are 
about all gone, many of the natural resources wasted, the 
population increasing, and competition growing sharper 
levery day, we have at last begun to realize that we must 
rve while there is yet an opportunity. 
Haman Energy Must Be Conserved. — Because human 
energy is limited, it becomes necessary to economize and 
distribute it in such ways that it will accomplish the most. 
If we put forth more energy than is necessary to do a 
ithing, there is waste. Likewise there is waste if less energy 
is put forth than is necessary to accomplish the task. We 
do our most difficult tasks with the least waste of power 
Vid energy when we accurately adjust our energies to the 
task to be done. The ends to be realized many times are 
semote and complex, and if we use adequate means, dis- 
tances in space, remoteness in time, units of energy neces- 
' to do the task, quantity of some kind must be taken 
into account, and this means that measurements must be 
Dade. Waste no doubt has occurred in education more 
because of a lack of scientific methods of conservation than 
because of any indifference or negligence on the part of 
teachers. 

Opportunities for Waste in Education. — When fully 
eonsidered, much of the great waste in education is due 
our lack of adequate means for placinj? reliable estimates 
I our results and processes. We lack in the matter of 
flefinite, desirable, attainable goals to be sought through 
B given topic, or process, or stage of work in a given sub- 
We have been forced to work in a more or less 



i1 



16 EDUCATIONAL MEASUREMENT 

blind, do-and-tmst-to-luck way. WTierever the application 
of scientific measurements to the achievements of school 
children has been made, it has sliown that great waste and 
unbusinesslike methods are beiag practiced. A school sys- 
tem should meet the same requirements that a business 
corporation must meet. The output must be commensurate 
with the expenditure. 

Just as in the business world the greatest economies are 
effected through small savings, so the school must expect 
to make its greatest gains by checking up the small leaks 
in time, ener^, and expense, A saving of thirty-five min- 
utes a day, for instance, would save a child one school year 
in eight, or a saving of three and one-half minutes per 
day would mean a saving of one school month in the course. 

The Business Man Eliminates Waste through Accurate 
Knowledge of His Processes and Products. — The keynote 
of successful business to-day is accurate knowledge of de- 
tails applied in such a manner as to eliminate needless 
waste. The margin between profit and loss is determined 
by the skill of the manager in effecting small savings. 
Science applied to the meat-packing industry, for instance, 
showed that a very big profit could be made by utilizing 
what had formerly been considered useless. In the busi- 
ness world nothing is left to chance. No important action 
is based upon vague opinion or untested theory. Exact 
knowledge must first be obtained. Seven out of ten busi- 
ness failures are due to a lack of definite, obtainable, busi- 
ness knowledge. 

A few months ago the writer had an interview with an 
employee of one of the life insurance companies doing 
business in the Pacific Northwest. He is what is known 
as a general agent whose business it is to go about the 
country and obtain local agents to sell life insurance. By 
8 system of feats, measurements, and personal interviews 
he is able so to choose his men that eight out of every ten 



I chosen become snccessfnl insurance agents. He said that 
if he could become 90 per cent efficient, that is, if he were 
able so to choose his men that nine out of every ten chosen 
■would be successful agents, he would save his company 
thousands of dollars each year. 
Nowhere may the struggle for efficiency be seen more 
concretely than in industry, for no field has a more con- 
stant and compelling motive-profit. Here are exhibited 
new processes, new labor-saving devices, new methods of 
planning, more detailed instructions, more exacting 
records. Astonishing statistics show that the products are 
doubled and tripled as a result of those methods. There 
is a decrease in cost for the producer and an increased 
product for the worker. 

» Waste of material, waste in effort, in energy, in lives, 
and in property, is reduced to the minimum in order that 
the profit may be large, 

Wastes in Education Many and Varied. — Because of 
the many tasks devolving upon the public schools, they 
have become the foremost instrument of social economy. 

tThis added responsibility has increased the chances for 
■waste and inefficiency. 
The startling revelation, made a few years ago, that 
our system of free education was failing to give even com- 
plete elementary schooling to the majority of children 
evoked imperious demands for more real facts. When 
thousands of children are reaching maturity, unskilled and 
unwanted, and are pointing an accusing finger at the school 
I they were so glad to leave, the public begins to question 

Pthe returns for the great sums so lavishly expended on 
ithe educational institutions. It begins a careful exami- 
nation of the waste products, the juvenile delinquent, the 
youthful criminal, the wayward girl, and the unskilled 
youth, all of whom are unwanted and ill-adapted for em- 
ployment. The school most measure quantitatively iti 



EFFICIENCY THROUGH MEAStJREMENTS 17 



t 



I plo; 




18 EDUCATIONAL MK\SUREMENT 

fundamental problems. It most aeeoont at every ttage for 
iXs raw material, its waste products, and its marketable 
commodities. 

The purpose of the school is not that the cliild shall 
learn, for be will learn without the school Learning is 
a spontaneous process which no lack of schooling can stop 
and no extent of schooling can do more than modify. Its 
purpose is to furnish conditions under which the child, 
through systematic and economic effort, will accomplish 
more for himself, and that which is accomplished will be 
of a better quality, and the product will be obtained by 
him in shorter time and with less expenditure of energy 
than if he learned it under other conditions. 

The principle of economy when applied to the indi- 
vidual cliild presents many problems of school organiza- 
tion, one of which is that the organization must be such that 
each child shall have an opportunity for being at his best 
all the time in all the subjects. The school has but one 
purpose, the education of children. Consequently, in a 
strict sense, economic management in education can be 
defined only as a system of management directed toward 
the elimination of waste in teaching so that children attend- 
ing school may be duly rewarded for the expenditure of 
their time and effort. The point at issue here involves 
the discovery of processes that, other things being equal, 
will perform a given task in the smallest amount of time 
with the minimum cxpendilui'e of energy. 

Four Economies in Education. — We seek economy in 
education from four different points of view: (1) economy 
through the quality of the product: (2) economy in the 
quantity of the product; (3) economy in the time; and 
(4) economy in the expenditure of energy. Waste will 
not be prevented and the maximum accomplished unless 
these four economies move together, each in harmony with 
the other, and each, therefore, serving as a check on the 



J 



EFFICIENCY THROUGH MEASUREMENTS 19 

other. Nothing may be gained by "robbing Peter to pay 
Paul." One fnndamentttl defect in pedagogical thinking 
has been the overcni]ihasis given to some one feature of 
human development at the expense of other features no 
less important. All interests must be carefully considered 
before we can claim efficiency. In penmanship, for in- 
stance, quality demands that the pupil be taught to write 
well, but quantity, time, and tJie expenditure of energy 
demand that he do not write too weU, Quantity and the 
writer's time demand that the pupil develop speed in 
writing, but the time and expenditure of energy of the 
reader are considerations that insist that he do not write^ 
too rapidly. " 

In the matter of style, quality demands that the style 
be most legible. This demand was the cause a few years 
ago of our substituting the vertical system for the Spen- 
cerian. But vertical writing, though most easily read, 
failed to satisfy the demands of quantity and speed. The 
attempt to obtain the proper evaluation of tliese four 
economies gives rise to experimental problems involving 
measurements. The efSeiency of instruction will be deter- 
mined, not by developing one phase of the work at the 
expense of the others, but by a development such that the 
sum of the net gains in the four economies shall be the 
maximum. 

Education should become efficient as industry is efficient.' 
This means that educators should know what the finished 
product is. They should be able to gauge the time, quan- 
tity, and value of the elements that entered into it. They 
Bhould see that they actually produce what they say they 
are going to produce. They must determine the proper 
ratio of product and time; they must define standards 
and eliminate waste; if they are to be efficient in con- 
formity to the world-wide ideal of efficiency, they must 
employ efficiency tools, one of the chief of which is measure- 



20 EDUCATIONAL MEASUREMENT 

ment, Edaeation cannot assume the efficiency ideal with- 
out adopting its concomitant — measurement. 

in. Placing Education on a Factual Basis Through 
Educational, Measurements 

If schools are inefficient for any one reason more than 
another, they are inefBeient because of the ignorance of 
facts concerning their processes and products. Although 
the prohlems concerning elementary education have con- 
fronted the world for centuries and many educators have 
attempted their solution, they are still involved in uncer- 
tainty and indefiniteness. The statements made on simple 
practical questions, even among our leading educators, are 
conflicting to the point of ahsurdity. Educators are 
divided into creeds. Those belonging to different creeds 
are seldom in agreement even on the simplest processes 
of educational procedure. Of course, there are some phases 
of education that are matters of creed. What constitutes 
good citizenship is a matter of creed, but it should be a 
scientific fact whether the rules of spelling are helpful, 
or what constitutes reasonable speed and accuracy in add- 
ing a coliunn of figures in the fifth grade, or what con- 
stitutes a legible specimen of handwriting. Where every- 
thing is "guesswork" and there is no proof to offer as to 
■who is right and who is wrong, it is little wonder that 
tlie ship of pedagogy is "waterlogged in the sea of 
opinions." Hume's description of the metaphysical 
sciences of his day may not inaptly be applied to the con- 
dition of current education. He said;' 

Even the rabble nttliout doors may judgs from the noise and 
clamour which they bear that all goes not well within. There 



1 



J 



EFFICIENCY THROUGH MEASUREMENTS 21 

is nothing which is not the subJMt of debate, and in which men 
of learning are not of contrary opinions. The most trivial ques- 
tion escapes not our controversy and in the most momentous 
we are not able to give any certain decisions. Disputes are 
multiplied as if everything were uncertain; and these dispatea~V<4 
are managed with the greatest warmth, as if everything were^^ 
certain. Amidst all this bustle 'tia not reason whieh carrieB 1= 
the prize, but eloquence; and no man need ever despair of gain- ^^ 
ing prMcIytes to the most extravagant hypotheses, who has art ?^i 
enough to represent it in any favorable colours. The victory^S^ 
is not gained by the men-at-arms, but by the trumpeters, dram-^ 
mers and musicians of the army. V^ 

There is mnch speculation. "I think," "I guess," "It 
is my opinion" are the characteristic phrases in educa- 
tion. "I know" is a phrase that has scarcely been ad- 
mitted. 

Two Kinds of Opinions and Their Uses. — Generally 
speaking, we may classify opinions under two general liead- 
in^: (1) expert opinion, and (2) the opinion of the lay- 
man. By expert opinion we mean the opinion of one wh«, 
by his long experience and study, is able to base his judg- 
ment on data usually not available to, or at least not 
possessed by, the average individual. For example, J. E. 
Wooters wanted to determine the dates and events that 
might be memorized most profitably by students in Amer- 
ican history in the seventh and eighth grades. He sent 
questionnaires to the members of the American Historical 
Association enclosing a list of 52 dates and requesting 
that the 20 most important dates in this list be arranged 
or "ranked" in the order of their importance. If other 
dates, not given in the list, were, in the judgment of 
those making the reply, more important than those sub- 
mitted, they were to be inserted. When the answers were 
compiled, the date 1776 ranked first in importance, 1492, 
second, and so on. These men presumably were giving 
expert opinion. Being historians by profession, it was 






22 EDUCATIONAL MEASUREMENT 

presamed that their opinions were the conclusions drawn 
from the best historic evidetice available.* 

The opinions of both the experts and the laymen may 
be individual or they may be the combined, average, or 
median opinions of a group. For example, the judgment 
of three or four surgeons called into consultation as to 
whether a certain operation should be performed illus- 
trates group opinion. Generally speaking, group opinion 
is more reliable than individual, whether it be among 
experts or laymen. It is less liable to be affected by 
personal bias and superficial characteristics. 

By lay opinion we mean the opinion or judgment of the 
average individual who expresses himself on the various 
current problems and topics, scientific and otherwise, vrith 
little or no data upon which to base his judgment. This 
type of evidence has been ruled long since out of court but 
not out of education. 

Although the opinion of the layman is of no value for 
scientific purposes, it is, nevertheless, a formidable factor 
in education, simply because the margin between technical 
information and lay opinion is so narrow that the educa- 
tional expert is not always able to convince the public 
that his position is right. 

Opinions Worthless in the Face of Facts. — In the 
absence of facts opinions reign supreme. In educational 
procedure mere opinion has had an all-too-important place 
on the programme. Where facts are available there is no 
longer any justification for mere opinion. The follow- 
ing illustration will show the fallacy of baaing any pro- 
cedure on mere opinion when the facta are available. Sup- 
pose a group of 40 individuals were asked their opinions 
of the length of a certain room. Let ua suppose that each 

'W. C. Bagley, "The Determination of Minimum Esaentiafa in 
Elementary Geography and History," Nationul Society for the Study 
of Education, Fourteenlh Yearbook, Part I, pp. 139-140. 



1 



EFFICIENCY THROUGH MEASUREMENTS 23 

of them is an expert in judging lengths of rooms. Their 
individual judgments may range all the way from 45 to 
50 feet. Then suppose we take the average judgment of 
the entire group and we find it to be 48 feet. In the 
absence of a measuiing stick this may be considered the 
best measurement that it is possible to get. Yet it is 
quite evident that this judgment may not correspond to 
the tacts at aU. It is possible that a measuring stick 
might show the length of the room to be 45 feet 8 inches. 
It is also quite possible that no one of the experts judged 
the length to be what the yard stick allowed. 

Conditions analogous to this have prevailed in educa- 
tion from time immemorial. Either because we did not 
have the facts or did not care to go to the trouble to get 
them, false judgments have been allowed to stand. Educa- | 
tion and theology are two fields where opinions have,/ 
reigned supreme. The layman has been free to challenge 
[^the judgment of the minister and the teacher because 
Mieither could prove, nor disprove the points at issue. The 
hcts upon which judgment should have been made have 
; been available. Fortunately, however, a.ssertions are 
> longer the style in educational circles. The scientific 
ipirit of the age demands that assertions be backed up 
irith statistics based upon the results of experiments. 
upae dixit will no longer suffice. 

Those working to develop measurements realize that only 

' getting the facts can education become an art and a 

Rience, and its practice changed from a vocation to a 

_ 'ofessioQ. Just as measurements and the possession of 

facts developed astronomy from astrology, chemistry from 

alchemy, and physics from mystery, so they will rescue 

education from the sea of opinions and place her in the 

lily of sciences where she rightly belongs. 

The reason why illusion and unverified hypotheses have 

I ran riot through educational discussion has been the 



. 



.{{ . If or I' '>«- 'iiu» iitiifr.tiiri I n\'^ ?r:fti :i >oive Tiie ^irob- 
..'..r. ..' M'Mm t i' ■.vfntlwKPs i MKiHi « n :'5V''-QDioi?y, ^^ereas 
-..•♦. .l«»ii«i /ill t*!! Ill* mIp. \s 1, ■onseflucnce. 've iiave 
. 'II j4< iMicM f .htliiMinhicni •TMuiiins -t vhat shoold be 

!nifn<H:lvi« iVfptliiiilii 2lot TTjimI in Tedaipsqy. — ?«?dain)ary 

it,r«iupiii , » cnmrUnhlv mniimiiiiiH oaiiitioiu :or. OS the 
friMiif iiiiMif 'luu liMiits lu* VHv "o "he ieveiopnieiLt oi 
lir .-;rn«n\«j f f^iiclt" uiM ailiMi "o iiiorT vtiat t :iaa long 
irt*i\ • Villi im<sit«(inqr -ti •! Iu»r ^I'uuiTliii* "^TXTsuits. nameiy. 
•Tin iiiiiirfiti* nrt!i«i»i »i ;tui"iv. 'fs vork ^las 'onsiated 
»»* w|Miif*n-4 i.>Krti »n M^ini.Mis iiiii, herororc*. ^i i miLSS of 
MiHi- uli.'fiir- n:«tpr:Al \i» M'^ilv -instaiiieti "orTurci oiove- 
iii^Tif v»n V ^MiorriHi iiifil vnrlii*riair *~t*wK irr^ sxibieered 
'.\ «n!*i\-.i<i II Iii^ iiihf ^» *!i*Ar uui 'inmisTakniiie imira. 

*-»ti\ 'v,,!.-!' iiii inin'-4\iis. htui'.ir.iui vul ouiVfj jurwapd 
j^.\u*.f..j' wi *i'ii5iiw. ■lu'ss 'in' ■•■liiitir' it ititufariccal 
'.".•••,«-.•', •* *sf.*.''.-.w;:i\"t •»• ' v-:Tn:Mi» iiini. v'lii'ii ixt Ttti- 
/.A'l. V - /.vrv.j*.-: ;.i»'.«^v.«. v.:i*; i:; »;!.•• »tr.ii'arLr.TVL T •r'.'.twCuPe 

J'//'.v j^v/;.'. *>■♦ v^"...-:.— \^ vv.vi js* n'.ij:?~T*i':--j£. "When 

il<i5 :t ';^r;^, v/^ «>.ji/. ^.^^/^'-i* ;r.^.ijr.'i*tr'.-j:L r^i^^sTToik, and 

iirt>\',tY ftA rni'^.Tu fvr rfA>.j&T !t.j? v.* «<-i'AriorAl policies. 

Llnrw of Demarcation hetwten Knowable Facts and 

Philoiiophioa] Opinions Most Be Sharply Drawn. — ^It must 

iifil Im' infrrrod thai all problems in edneation are amen- 

blr to moasiirrTnont. As was indicated above, elementary 

cation ]>roson1s two distinct types of problems: one 



EFFICIENCY THROUGH MEASUREMENTS 25 

is involTcd in snbtkties and belongs to the dtpart- 
ment of philosophy; the other is more superficial and is 
in a large part a question of science. The first includes 
those factors which relate to the deii-elopmcut of character 
and the eultoral phases of one's life. Culture emphnsijes 
the things of the mind and the higher life. It scrfts to 
beget the ability to enjoy the beautiful and the gix>d wher- 
ever these may be found. It seeks the ability to think 
the best thoughts of the best men that we find enshrined 
in literatore. If one would make life worth living, he 
mast partake very largely of cultural things. These thii^ 
are not directly amenable to measurements, at least not 
at the present time. On the other hand, the mechanical 
skills, positive knowledge, most of the things tauj^ht in 
the fonnal sabjeets, such as arithmetic, grammar, read- 
ing, spelling, etc., readily yield to quantitative treatment. 
It lies within our reach to ascertain the time consumed by 
different teachers in obtaining certain positive results and 
to discover what processes have proved most ct'onomicHl, 

Data on Simplest School Boom Problems Lacking. — 
Anyone accustomed to the problems that come up for 
solution in the course of a day's teaching can think of 
dozens of questions for which he would like a definite 
.answer, as, for instance: What words should be taught 
in spelling^ How many? What methods should be usedt 
How rapidly should a boy in the fourth grade read, that is, 
how many words per minute? What is meant hy legibla 
handwriting? What dates should be taught in history 1 
!Are rules beneficial in the teaching of spelling? Stand- 
ard tests and scales offer the same effective instruments of 
research in dealing with these problems as the meter and 
the gram offer to the student of physics. It is difficult 
to see how anything can be done for the science of educa- 
tion without them. At present we are absolutely unable 
to form an intelligent judgment of how much time should 



26 EDUCATIONAL MEASUREMENT 



1 



be reqnired to teach any of the simple, iormal thinga in 
education. Educators are not yet agreed what words 
should be tanght, when to teach them, and how long it 
should take to do it. It is in these fields that education ii 
inefficient. 

Cultural Development Supplemented by Mechanical 
Phases of Education. — "While everyone engaged in 
measuring recognizes that the highest product of the school 
training is the development of ideals and character in 
children, some are becoming convinced that successful work 
in character-building is absolutely dependent upon the suc- 
cessful work of developing the fundamental skills and the 
^, formal phases of education. The studies that have been 
made show that culture and inspirational work are condi- 
tioned by the proper equipment of the child with the me- 
chanical tools by which all mental work is done. It is a mis- 
take, therefore, to think that the time devoted to the 
, mechanical skills is out of harmony with, and antagonistic 
' to, cultural education. On the other hand, it is equipping 
the child with the tools for appreciating the beautiful and 
the good. 

No Excuse for Mere Opinion Where Facts Are Arul- 
able. — The time has come in education when mere 
opinion vrill satisfy neither the modem school adminia- 
trator, nor the public. Things in the business world are 
measured and standardized with a precision not yet 
reached in education. Even those things which a few years 
ago were considered incapable of measurement are now 
measured and standardized so that the public knows ex- 
actly what is meant when a thing is spoken of as belong- 
ing to a certain class or grade. For instance, oranges and 
apples are standardized according to the variety and size. 
When a grocer orders grapefruit and says to the salesman, 
"Send me a box of 'sixty-fours,' " he knows just what to 
expect and knows whether the order has been filled prop- 






I 



EFFICIENCY THROUGH MEASUREMENTS 27 

erly when it arrives. In the ease of shoes the average indi- 
vidual may not be able always to tell the difference between 
' and "seconds" but the expert can, and it isn't 
mere guessworl; on his part. He has perfectly definite 
standards by which to judge. 

If we ask a carpenter to build a room we give him a 
get of specifications of the length, width, and shape of the 
room that we desire to have built. When he has finished 
the job, we can take the specifications and check the finished 
product to see if it is done according to them. 

Any business man who is managing a successful busi- 
ness in which he expects to make profits will have stand- 
ards from which he will work. He will have measuring 
stieks to measure the efficiency of his output ; he will liave 
units of accomplishment ; he will have cost-accounting 
systems; he will have standards of many kinds. He also 
will have a continual system of testing and working-over 
what he is doing to find out whether what he is doing 
is the best that can be done with the money he has at 
hand, and whether the output is as large as he could well 
expect with the energy and money he has invested in it. 
"Why should we not run the scliools on the same scientific 
basis? 

Answers to snch questions as what a 12-year-oId boy 
in the seventh grade should do with a ten-figure-column 
addition problem have been, until recently, a matter of 
opinion, and usually of conflicting opinion at that. All 
■would agree that a boy of 12 should do better than a boy 
of 10, hut we have had no adequate conception until 
recently as to how much better he should do. We have had 
no objective standards of what a child should do at any 
specified age. With the advent of educational measure- 
ments we now are able to show what the best child will 
do, what the poorest child wiE do, and what per cent of 
children will he able to make a particular score. 



1 



J 



EDUCATIONAL MEASUKEMENT 



1 



Biology, physics, chemistry, and psychology have grown 
by methods of analysis and measurements. Education 
must imitate this prudence if she would command the 
respect of scientific men. To make education effective and 
efficient we must have an eye for causes of inefficiencies. 
"We must discover the causes of success in some cases and 
failure in others. Educational measurements offer impar- 
tial and impersonal evidence which cannot be refuted. 

Relation of Time Consumed to the Finished Product. 
— In any well-regulated factory or business enterprise 
of any kind there is a definite relation between the time 
|consumed and the amount of work done. The time it takes 
to make the various parts of an automobile, or a wheel- 
barrow, or a steam shovel, is known within quite definite 
limits, and an employee is quickly brought to account when 
this relation is serionsly disturbed. How different it is in 
the school business. The researches made in education 
25 years ago by Dr. J. M. Riee, shocked the educational 
world by revealing the terrible state of ignorance among 
educators of some of the simplest things in their profes- 
sion. They tried to laugh him out of court; to treat him 
as an eccentric person trying to perform the impossible 
by measuring things in human nature. Dr. Rice's own 
account of the reception given him at the national meeting 
of superintendents at Indianapolis, in Pebruarj', 1897, is 
both interesting and illuminating, and when contrasted 
with our present attitude toward him, gives us much 
encouragement because of the progress made. Dr. Riee 
had published in the Forum of December, 1896, an article 
entitled, "Obstacles to Rational Educational Reforms." In 
reference to this article he says :' 

In a way that I had not anticipated I brought it directly to 
the notice of the Department of Superintendeace at its annual 

'Sdeniific ManagemerU in Ed-uaiiion, pp. 17-18, 



EFFICIENCY THROUGH MEASUREMENTS 29 

meeting in Indianapolis, in February, 1897. I had been invited 
to conduct a round-table discussion on the three R's, and had 
expected a handful of people to talk the matter over quietly and 
leisurely. But it so happened that the round-table turned out 
to be a mass meeting, including the picked educational people 
of the country. After a few opening remarks I endeavored 
to arouse discussion on the question which I stated somewhat 
as follows: In some cities ten minutes a day are devoted to 
spelling for eight years; in others, forty. Now how can we 
tell at the end of eight years whether the children who have 
had forty minutes are better spellers than those who have had 
only ten? 

I had expected in this way to draw out the ideas of those 
who beUeved in much teaching of spelling and those who be- 
lieved in little of it, and thus to labor for a compromise; but 
to my great surprise, the question threw consternation into the 
eamp. The first speaker to respond was a very popular pro- 
fessor of psychology engaged in training teachers in the West. 
He said, in effect, that the question was one that could never 
be answered; and gave me a severe drubbing for taking up the 
time of such an important body of educators in asking them 
silly questions. 

The next speaker was a prominent superintendent, who did 
not like the way I had been treated and tried to come to my 
rescue. After this quite a number took the platform in response 
to calls from the audience and spoke on the spelling in a gen- 
eral way; but no one attempted to answer the question. 

It is interesting to note that when the same association 
of superintendents met fifteen years later in St. Louis 
they devoted forty-eight addresses to the subject of edu- 
cational measurements. 

We naturally should expect that the ordinary principles 
of arithmetic would operate in finding the relation between 
the time spent and the results obtained. We would expect 
that if twice as much time is spent on a subject in one 
school as in another, the ratio of the finished products 
would bear some such relation as two to one. Such, how- 
ever, was not the finding of Dr. Rice. He found classes 



30 EDUCATIONAL MEASUREMENT 

to whom spelling was taught incidentally just as efficient 

as those who had forty minutes of daily drill. 

The first forward step in a problem of this kind is to 
get the facts on what is actually being done. Then by a 
comparison of the things accomplished in the various 
schools, tentative standards may be set up. The goals 
reached by the best schools may be taken as the standard 
of efficiency toward which all schools should work. When 
standards are set and facts are gathered on the standing 
of any school, it becomes a comparatively easy matter to 
measure progress. 

In the absence of facts on what ought to be accomplished 
and what is actually being accomplished people have had 
to guess as to the efficiency of the school systems. If, in 
the opinion of school officials or the public, the schools 
are deficient in some particular phase, the usual procedure 
is to give more time to that part of the work, even though 
the supposed weakness may actually be the strongest link 
in the chain of processes. A superintendent thus con- 
fronted must do something to meet the onslaught of criti- 
cism directed against his school. Not laiowing a better 
thing to do, he adds a little more time to that phase of 
the work that has been pronounced inefScient, and trusts 
to luck that what he has done will satisfy the public and 
do no harm to the school. 

The following taken from the preface of the Salt Lake 
City Survey • illustrates the point in question : 

During two or three years preceding 1915 a eertain amonnt 
of general criticism developed in Salt Lake City with reference 
fo the work of the seliools and the efficiency of the instruction 
and supervision. The harraoiiioas cooperation which had pre- 
viously existed between the Board of Education and the Super- 
intendent of Inatructioo came to be somewhat impaired and 
the confidence of the citizens in their schoob aomewbat shaken. 



1 



EFFICIENCY THROUGH MEASUREMENTS 31 

In particolaTy the rather eommon complaint was raised that the 
administration of the schools was not efficient and that the 
instruction in the fundamental school subjects was not produc- 
ing the best results. The superintending authorities did what 
they could to meet such criticism by increasing time allowances 
and similar measures but without appreciable results. Finally 
the superintendent of the schools for the city recommended to 
the Board of Education that a survey be made. 

The survey showed that the Salt Lake City schools were 
strongest in the very phases they had been considered weak- 
est by the public. 

An opinion circulated in the educational field soon be- 
comes the equivalent of a law. For instance, the idea 
was circulated that spelling was a matter of heredity and 
the heirs to this wonderful faculty were very largely con- 
fined to the upper classes, yet Dr. Rice found in giving 
his spelling tests that the highest grades made by any 
children tested were made by children whose parents were 
Bohemian cigarmakers. 

In arithmetic he found the children in the slums of 
some cities doing a great deal better than those from the 
best districts in others.^ 

The Public Is Asking for a Ledger Account of Our 
Business. — ^When the attitude of the public is impartially 
considered, one must come to the conclusion that it has 
been both tolerant and liberal. The debit side of the 
ledger shows the time and money expended, the number 
of teachers hired, the books and apparatus provided, the 
buildings built, and the courses of study made. On the 
credit side is recorded in vague, meaningless adjectives, 
and numbers which follow no known rules of mathematics, 
the standards reached and the progress made. The public, 
and even the teachers themselves, do not know what 
''excellent" means in history, what ''A" means in algebra, 
or what "95 per cent" means in Latin. In the case of 

»JWd. 



32 EDUCATIOXAL MEASUREMENT 

Latin, for instance, a father wonld have a pretty hard time 
idling the rdative amount of Latin his two sons knew 
if one were ^ven a grade of 90 per cent and the other a 
grade of 45 per cent. Applying the ordinary roles of 
arithmetic he might expect one to know twice as mnch 
Latin as the other, bat sach probably is not the ease. A 
boy may take home a grade of 90 per cent in arithmetic 
or a grade of 70 per cent in penmanship year after year, 
and when his parents sign his monthly report card they 
do not feel very much of a thriU over the knowledge they 
hare gained. With praetieally no standards by which to 
work an accountant wotUd have a pretty hard time audit- 
ing the books of a system of pablic schools and enlighten- 
ing the pnblie as to their efficiency. 

In the school bnsiness we have a state monopoly in 
which there is almost no accoonting. We have compelled 
the children to come to schooL We have made courses of 
study which we felt were good but which were very often 
based on conditions past, rather than on present and future 
needs. After the instruction has been given to the chil- 
dren we have been content to trust to the growth proc- 
esses to bring out the kind of development hoped for. 
Professor Cubberley likens the methods employed in school 
to the old-fashioned luck farming, where the fanner looked 
at the moon, guessed at the weather, put in his crop, and 
prayed to the Lord to pull hira through another season." 

Inability to Show Facts Has Worked a Hsurdshlp cm the 
Teaching Profession, — No one knows the number of good 
teachers who have lost their positions because they did 
not have facts — ^hard, cold, tangible facts — to prove to the 
public and the school officials that their work was meri- 
torious. They have been dismissed many times because 

•EDwood P, Cubberley, "The Significnnee of Educational Meaaure- 
tnentfl," Third Annua) Conference on Educational Measurements, 
Bulletin No G (Indiana Univetsity, 1S17), p. 7. 



1 



I 



EFFICIENCY THROUGH MEASUREMENTS 33 

the demands were unjust. Unless a teacher has a correct 
idea as to where her pupils are, educationally, when they 
come to her, and what their native capacities are, how 
can she or the public or the school officials know what 
progress they have made? A supervisor may go into a 
gi-ade room and say to the teacher, "You are doing poor 
work in reading." The teacher may respond by saying, 
"No, I am not. I am doing g-oad work in the subject of 
reading." In the absence oi a measuring stick or a stand- 
ard of good reading how is one to tell who is right? It 
is simply the supervisor's opinion against that of the 
teacher. If, however, some standard lias been set such 
as that made by Professor Courtis, the teacher may say, 
"Let us measure the pupils and see." If they can read 
with the speed and comprehension set as the standard, the 
supervisor will have to withdraw his statement and the 
teacher is sustained. On the other hand, if the class is 
not op to the standard, the test will quickly show it. The 
test being impartial and unbiased, the teacher can have no 
complaint to make against the supervisor for unjust 
demands. 

As soon as school officials recognize the fact that measure- 
ments define for them just how much may reasonably be 
demanded, they will he unafraid of measurements. They 
will learn the administrative lesson that it is better to know 
for purposes of ordinary routine what"" ought to be de- 
manded, than merely to guess at conditions. 

The business man used to be offended if anyone criticized 
llis methods or commented on his results. Now he knows 



that his best friend is the man who comes and tells him 1 

exactly where he stands. The one thing a successful busi- I 

^^ ness man cannot approve of to-day is ignorance about the I 

^^jesults of his business. He does not fool himself any more, I 

^Kcome what will of the revelation. I 



34 EDUCATIONAL MEASUREMENT 

school is weak and vhere it is strong is armed against 
critieism. More than that, he is guided in hia future efforts. 
Some superintendents have feared the facts in reference 
to their schools because they knew that the number of im- 
perfections revealed in the survey would be appalling. One 
has considerable hesitancy in going to the dentist know- 
ing that bis teeth are sure to "go to pieces" under the 
keen semtiny of the expert. Yet bis better judgment tells 
him to go. In the presence of a popular demand for the 
revelation of the imperfections and the absolute certainty 
that imperfections exist, it is not difBc.ult to understand 
why there is a tendency on the part of many school offi- 
cials to combat the movement towards widespread measure- 
ments. 

The demand for measurements is likely to be especially 
keen if there is some parent in the community who does 
not like the superintendent or the principal. Such a 
parent may desire to "show up" the inefSciency of these 
officials. He never believes for one moment that the 
responsibility for unsatisfactory school work may be traced 
to the native limitations of his child or to the borne atmos- 
phere in which he grows up. Such a parent is sure that 
measurements will detect at some point a lack of perfec- 
tion that will give bis dislike for the school officer the 
sanction of science. 

In Bpite of the imperfections, the American people are 
willing to pay and pay well for any reasonable project in 
education. Their liberality cannot be questioned even 
under present conditions. They may be made to go to 
almost any limit if given the facts. 

rV. ESTS-BLISHMENT OP DEFINITE STANDARDS 

It is obvious that one must have some kind of a stand- 
ard before anything may be judged. If one says he has 
a good suit of clothes, be is basing his judgment on some 



1 



EFFICIENCY THROUGH MEASUREMENTS 35 

fitaodard or standards. It may be good because it wiU 
wear well, or because it fits perfectly, or because it holds 
its shape well when pressed. The merchant in selling the 
suit may have called attention to the color; that it will 
not fade ; that it is all wool ; that the weave is of the latest 
design, etc. The purchaser takes the data submitted by 
the merehaut and measures the facts by his standard of a 
good suit. He knows that a good suit will wear well ; that 
it must be all wool; hold its shape well; fit perfectly, and 
be pleasing as to coloi-s. There will always be differences 
of opinion as to which of these qualities should have the 
most weight. For instance, two suits may be of the same 
price, but one is made of better matei'ial than the other, 
while the second may he more pleasing as to color. One 
customer may prefer to talie the poorer quality of goods 
in order to get the proper colors or weave, while another 
may place practically all tlie emphasis on the wearing 
qualities of the garment. Experience has taught the 
purchaser what his standards should be. 

When we make a similar comparison to a school system, 
we find conditions decidedly changed. In the first place, 
the school oiBeial cannot furnish the individual about to 
judge a school system with the facts as the merchant could 
the purchaser of the suit of clothes. In the second place, 
he has not had, until recently, any way of knowing what 
a school system, or a particular grade in a system, ought 
to do. Suppose he were told the sixth grade could read 
a certain selection at the rate of 150 words per minute 
and could reproduce 80 per cent of the ideas found in it. 
"Would he consider this a model class as to reading, a 
mediocre class, or a poor one? Or suppose he were told 
that jn a certain school system the multiplication tables 
were taught to a beginning fourth-grade class in twenty 
lessons so that 95 per cent of all the children in that grade 
knew them perfectly. Would this be considered good, 



36 EDUCATIONAL MEASUREMENT 

mediocre, or poor as far as time is concerned? In other 
words what is the standard as to time in teaching the 
multiplication table? 

A bricklayer has a standard and knows how long it 
ought to take to lay 1,000 bricks. A man shingling or lath- 
ing a house knows how long it onght to take to cover 100 
square feet of surface, or how long it takes to nail on 1,000 
shingles or laths. All such work is broken up into definite 
units of accomplishment, and standards are set for each 
unit. 

There are hundreds of things in education that might 
be broken up into definite units of accomplishment, and 
standards of achievement set for each unit. In our country, 
where elementary education is characterized by the absence 
of system, it is not unusual for individuals, whether edu- 
cators or laymen, to examine a class with a set of questions 
selected in an arbitrary way and judge by the results 
whether or not the teacher has done satisfactory work. So 
long as we have no definite standards, judgments based on 
the results of an examination may continue to do a gross 
injustice in estimating hoth the qualifications of the 
teachers and the value of the methods employed by them. 

Three Kinds of Standards Needed. — At least three 
kinds of standards are needed in education: (1) standards 
as to quantity; (2) standards as to time; (3) standards 
as to quality.^ 

1. Standards as to Quantity. — The first step toward plac- 
ing elementary education on a scientific basis must neces- 
sarily lie in determining what results reasonably might be 
expected at the end of a given period of instruction. Of 
course, the first thing that determines the standard must 
be the degree of capacity that the average child has for 

»W. W. Black, "The Movement for Greater Economy in Educa- 
tion," Se<!ond Aiinua! Conference on Educational Measurements, 
Bulletin No. 11 findiana University, 1915), pp. 7-12. 



EFFICIENCY THROUGH MEASUREMENTS 37 

the type of training that we want to give him. We cannot 
fix the standards irrespective of the child. The theoretical 
ideal of perfection must be overthrown and rational de- 
mands made, not on what an ideal child should do, but 
on what the average child, as he comes to the teacher, is 
expected to do. Then we can venture to tell the parents 
with assurance that their children in the fifth grade, for 
instance, are as good as the average even if they misspell 
50 per cent of the words in a certain spelling test. 

The standard set will be subject to many conditions and 
will have to be justified from many points of view. It 
will be open to question from the point of view of economy, 
organization, and method. Because of the interrelation of 
these different factors in determining efficiency, such ques- 
tions as the following arise : Does the standard demand too 
much or too little in time and energy? Could the in- 
dividual, under a different organization and method, meet 
the requirements of the standard more economically? 
Could he, under a different method, raise the standard with 
the same expenditure of time and energy? Is it desirable 
to raise the standard? Is the method such that the pupil 
ha^ developed a sufficient degree of ability in applying 
the knowledge indicated in the standard? 

Definite standards should be required in determining 
and checking the applications of the principles of method. 
When our standards as to quantity are determined, the 
teacher can tell exactly when the child's mechanical work 
is completed. This would enable him to determine the 
amount and character of drill work to be done. ^^\ 

It is not what the teacher does that counts. It is what I i 
the child does and thinks, and, until our work is organized I T/fC 
to take advantage of these factors, we cannot hope to \ 
improve the efficiency very much. We must know exactly^ 
what is meant by ''satisfactory results," then no time 
would be wasted when the goal is reached. Many pupils 



38 EDUCATIONAL MEASDREMENT 

hare been kept at writing and other meehanic&l drilU 
long after $atisfactory results have been reached simply 
because the teacher did not know when the papil arrived. 
There is a great deal, too, to be said for definite standards 
for the psychological effect they have on both the pnpil 
and the teacher. Each likes to know how well he is doing 
and how near he is to the goaL 

It may be argaed that it would be impossible to secure 
a definite standard for measoring results general^ 
applicable in our country on the ground that the needs 
of OUT people vary in different localities. While this saiti- 
ment deserves recognition, it will become apparent, during 
the course of this chapter, that proper attention to local 
conditions in the conduct of onr elementary schools would 
not tend in the least to alter the measurement plan as s 
whole. 

2. Standards as to Time. — ^When our standards as to 
quantity have been determined, our attention may then 
be directed toward the discovery of short-cuts in educa- 
tional processes that will save the child's time. All would 
agree that we should devote a reasonable amount of time to 
get a reasonable result. But what constitutes a "reason- 
able time" is the unsolved problem. To arrive at a con- 
clusion in this matter we must find how much time has 
been given to a subject in the best schools where reason- 
able results have been obtained and make our calculations 
accordingly. 

Because of a lack of standards, we no doubt have ex- 
pected far too much of pupils on some occasions, and on 
other occasions they could have done twice as much very 
easily. William James has pointed out that the average 
man lives far within his limits and possesses powers of 
various sorts which iie habitually fails to use. There is 

'le donbt that the requirements of school children are 
ithin their limits in many instance, and that they 



1 



EFFICIENCY THROUGH MEASUREMENTS 39 

could easily do two or three times the work they now do 
in the same time. 

3. Standards as to Quality. — Much already has been said 
about quality; so a short discussion will suffice. When a 
child can write a specimen of handwriting equivalent to 
Quality 13 or 14 on the Thorndike Scale, for instance, he 
is said by competent judges to write well enough for all 
practical purposes. Any further time spent in improving 
quality above this point is a questionable procedure unless 
it is definitely known that the student will be called upon 
to do clerical work that may require a higher standard. 
The point is, that standards give focus and direction to 
the work and fix a point above which drill may not be 
profitable. Many boys and girls in school whose hand- 
writing is equivalent to Qualities 14 to 17 on the Thorn- 
dike Scale are required to practice daily when their arith- 
metic and grammar need their time much more. 

Standards Changing. — Some object to standards on the 
ground that they are constantly changing and for that 
reason are really not standards at all. The argument is 
poor, however, because that is exactly what we should 
expect if progress is to be made. That is what happens 
in business life everywhere. "While standards do change, 
yet they are stable enough to justify their determination 
and are absolutely essential to the best school work. As 
a matter of fact, all teachers have them but they are sub- 
jective and for that reason are not as valuable as they 
oi^ht to be. 

V. Pbofitino by the Espebience op Those in Otheb 

OCCtTPATIONS AND PROFESSIONS 

In the manufacturing world inventions and improve- 
ments are usually made in one of two ways: A manu- 
facturing firm may "go to school to its competitor" and 
learn new ways of doing things, or it may offer a reward 



40 EDUCATIONAL MEASUREMENT 

to its employees who can devise new and better ways of 
doing things. Any patent which is a labor-saving device 
is quickly appropriated by men in all lines of business. 
There is no hesitancy in appropriating new ideas irrespec- 
tive of their source, the only question being. Will they 
work? In the recognized field of science, such as physics, 
chemistry, medicine, etc., the members of the profession 
are not only willing to learn from each other but they are 
compelled to do so under penalty of law. Those who fail 
in practice to give due recognition to important discov- 
eries are held responsible for the consequences. 

If education is to become a science it must utilize scien- 
tific methods just as other businesses and professions have 
done. We must discover some truths in regard to educa- 
tional processes which, if ignored by the teacher, will make 
him liable to prosecution for malpractice just as the 
physician who lias bungled the setting of a bone. 

Teachers make mistakes year after year simply because 
no record is made of their procedure and because there 
are no signposts to guide the new ones away from the pit- 
falls. Successful businesses and professions are not 
operated in such a haphazard manner. 

Our government maintains a Bureau of Standards at 
Washington with exact units for aU measures of mass, 
length, and time, with a large number of derived units. 
For all current work in physical science these standard 
units are essential and without them modern physics and 
chemistry could not exist. Every new discovery or in- 
vention in science must be stated in terms of them, and 
every physical and commercial enterprise in modem civil- 
ization is absolutely dependent on them. It is not surpris- 
ing that a physicist with highly refined instruments for 
physical measurements sliould look doubtfully upon the 
yet immature efforts of measuring mental qualities and 
mental achievements. Would it be possible in education 



1 



J 



EFFICIENCY THROUGH MEASUREMENTS 41 

to get the same kind of measurements we have in physical 
science, as measTirements in length and weight? We can 
tell a boy's stature in a perfectly definite way so that 
everybody in tiie world will know what we mean. "We 
can. tell the weight of a pei'son or the number of pounds 
he can lift, and the world accepts our measures under- 
Standingly and without question because the units have 
been standai'dized. 

The following examples from the business world indicate 
the great painstaking care that successful business takes 
to see that eiSciency is maintained througliout its system, 
and how responsibility is placed on each one in the system 
to carry out his particular part of the programme. 

In the Middle West a great corporation manufactures 
stoves in large quantities. The shop history of each stove 
is kept in detail from raw material to the shipment of 
the finished product. The name of each workman 
responsibly concerned in the inspection of the raw material, 
in the making of any part, in the examination and ship- 
ment of each part, is recorded. A complaint from the 
purchaser immediately locates the responsibility upon the 
workman at fault. The article upon which all this pains- 
ng care is lavished sells from fifteen dollars up. The 
method of successful business makes it possible to place 
responsibility where it belongs. The public schools might 
well imitate this prudence. 

The Manufacturer Has Specifications, — The manufac- 
turer knows exactly what he is trying to do. He draws 
his specifications according to the needs of the market 
he is trying to supply. He knows how much of his out- 
put is efficiently manufactured and how much raw material 
goes to waste in the dump pile. His object is to obtain 
1 more satisfactory output so there will be a larger profit 
"or the business. 

The ideals and processes of scientific method in educa- 



42 EDUCATIONAL MEASUREMENT 

tion are in salient respects similar to thme that are re- 
shaping the processes of industry. In education as in 
industry the scientific idea is, at base, analytic scrutiny, 
exact measuring, careful recording, and judgment on the 
basis of observed facts." 

The school principal who tests his school is guided in 
his future efforts. The fact that adverse criticism causes 
no shock is of some importance, and the positive result 
is that the school is stimulated to improve itself. "We have 
awakened to a startled realization that, in education as 
in other forms of organized activity, applied science will 
do the work much better than the old trial and error 
methods. Even those processes that have rested secure 
in the sanction of generations may be improved by the 
application of science. 

In dealing with the application of scientific methods 
to education and to industry we must ever bear in mind 
two fundamental distinctions. These relate to the ase of 
time and the types of product in these two kinds of activi- 
ties. In industry the finished product conforms to a definite 
pattern. It is a constant. The variable is the time. If 
the finished product is to be a wheelbarrow of a certain 
design, the problem is, How long will it take to turn ont 
a wheelbarrow according to the specifications laid down 
in the pattern? The task to be done is always definite. 
In education it is different. The time is a constant, eight 
years in the grades, for instance. There is no definite 
pattern according to which we try to mould the lives of 
the boys and girls. Their native capacities, aptitudes, and 
proclivities are different. We expect training to develop 
more difEerences than fewer. These fundamental differ- 
ences must be kept in mind when business methods and 
education are compared." 

"Making Education Definite," iidd., p. 87. 



1 



r EFFICIENCY THROUGH MEASUREMENTS 43 
VI. Meeting Some Objections to Educational Tests 
AND Measurements 
In launching any great pampaign for reform in method 
and ways of doing things, one must expect considerable 
opposition from not onlj' those engaged in the business 
but from outsiders as well. Meeting the arguments against 
such movements becomes, therefore, as vital a factor in 
getting the new methods in operation as some of those 
which are very much closer to the real problems to be 
solved. In this part of the discussion we shall note some 
of the objections which have been raised in reference to 
tests and measurements in education. 

Educational measurement, lihe all new movements, has 
its scoffers and its zealots. It is, of course, easy to point 
out imminent dangers in measurements. Like all new 
movements it will have to run the gauntlet of criticism. 
Measurement in education presupposes commensurable 
gaantities, yet it is known that many educational products 
are incommensurable. There are spiritual and material 
products in education. Over the former there will ever 
be the veil of mystery and doubt. Man spiritually has 
.always been the enigma of existence and will continue to 
be so. We can never plot the curve of genius nor measure 
the unit of inspiration. But there are a vast number of 
lAducational products that are measurable. The day of 
"the educational engineer is at hand. The day of educa- 
tonal opinion will go, and the sway of dominant peraon- 
islities in education will be limited by facts. 

Wlien a new movement is launched, the best thing to ^ 
■do is not to endorse it, or condemn it, but to study and L 
understand it. This is not what always happened in edu- f 
'cation, however. But the opposition to change is not J 
Rpeculiar to education. It makes its appearance in other/ 
professions and industries. The important thing is to be 



■ .ijiW EDUCATIONAL MEASUREMENT 

able to meet the objectors with hard, cold, irrefutable 
facts that will prove their contentions to be wrong. The 
objections to testa are many and varied. We shall note 
a few of the more common ones. 

1, Tests will not endure.- — ^Some opposing the testing 
^ movement say that tests will not endure the ravages of 

;; *_, time and change; that they are of temporary value, and 
■.' 'iy^^ those we are using to-day will be "scrapped" for some- 
^ J %^ thing else to-morrow. In answer to this objection all we 
^B'Ej;, can say is that we do not expect them to endure in their 
^E^ ^ present imperfect form. Progress is made by casting aside 
^^^ ■■< -the outgrown and imperfect tools for the more modem 
^ ^ "-^ instruments. Other sciences have progressed in this way. 
_' - ^ ■- The pseudo-sciences of ancient times supplied the founda- 
tions for the later scientific achievements. In any advanc- 
ing society, old knowledge, old philosophies, and old cul- 
ture must constantly be in a state of reconstruction to 
keep pace vrith the race's progress. 

2. The child mind is too complicated to measure. — There 
are those who say that the mind is complicated by so many 
elements that enter into its development that no definite 
conclusions can be drawn. They are supported in this 
view by the fact that even broad-minded teachers of wide 
experience differ on the most elementary points coming 
under their daily observation. And this further item may 
be mentioned in their favor, that even the same teachers 
are constantly changing their views; they no longer be- 
lieve in one year what they firmly believed the year before ; 
and a year later they will begin to feel that their second 
theory was wrong and their first was right, and so on 
indefinitely. This state of affairs comes about because 
judgments have not been based on facts but on mere 
opinions. Whatever the reasons may be for changing their 
opinions, educational measurements tend to stabilize them 
because they are based on facts. 



EFFICIENCY THROUGH MEASUREMENTS 46 

Some teachers, for instance, will tell ns we cannot 
measure thought progression in English composition, never- 
theless those same teachers mark one composition 75 per 
cent, another, 77 per cent, and a third, 74 per cent. When 
they have had 75 per cent as a passing mark they have 
continually refused to promote those who were given a 
grade of 74. 

3. The judgment of a competent man is better than 
scales. — ^A strong objection is made to scales on the ground 
that the common-sense judgment of a first-rate man is 
better without these units and scales than the action of 
a stupid or incompetent man with them. The important 
thing is not whether a first-rate man can do better with- 
out such scales than an incompetent one with them, but 
rather that the efficiency of each is increased. On the 
other hand, it is precisely the work of science to get good 
work done by those who are rather mediocre. Thanks to 
the progress of science, we can now solve problems that 
Aristotle could not. We would all prefer to have a stupid 
doctor of to-day who, nevertheless, understood the use of 
antiseptics and antitoxins than Galen or Hippocrates, 
though in respect to common sense there would be no 
choice. 

The dangers of measurements are as real and imminent 
as the advantages are self-evident. Dangers will arise from 
the mass of superficial and erroneous results that will 
certainly be presented to the educational world in the guise 
of scientific contributions applied to pedagogy. But we 
must welcome aU these contributions, challenge them, and 
attempt to verify them. The sciences cannot escape in- 
jury if their results are forced into the rush of the day 
before the fundamental ideas have been cleared up and 
an ample supply of facts collected. 

4. Tests tend to reduce all educational work to a dead 
level with no allowance for individual differences, — ^As to 



46 EDUCATIONAL MEASUREMENT 

the tendency to obtain oniform results, the effect is meant 
to be just the opposite to uniform, Oor survej-s are ^v- 
ing Qs a knowledge of conditions, both in school and out 
of school, under which the individual child lives. Physical 
tests will tell us more accurately than we have been able 
to know heretofore the ability of each indiridual. With 
this aceurate knowledge of the individual child, there is 
no way open but to make his self-furnished standards the 
guides in his case. This charge of reducing all individuals 
to a dead level is bom of a complete misunderstanding of 
the aims and processes of the new method. Its aim is not 
uniformity but individual development. People fear lest 
educational experiments wjll make children lose their spon- 
taneity, or spirituahty, or something of the sort. They 
feel a certain skepticism about tests and scales. "We should 
not care much for the teacher who did not have a great 
interest in the finer, subtler qualities of character and 
tastes. At the same time an artist can preserve all his 
interest in the finer, subtler qualities of taste and still 
use the compass points lo measure his distances as he 
draws. "Certainly, that old rebuke made in the first days 
of child-study movement that if we were at all scientific 
with our children we would come to love them less is out 
of place. It is the mothers who love their babies most who 
wdgh them oftenest." " The mother who weighs her baby 
every month, and to an ounce, is in these days precisdy 
the mother who loves her baby on the averse as wdl, if 
not better, than others. 

Teachers must not expect tests to do everything for 
their schools. The rather mediocre intellect of youth which 
repels information will not blossom out ovemi^t by some 



"Bdwsri L. HmnMlike, "Uahs uxl Scales ttx MeuaniiK E( 
tiottkl Prodocts." Pn)ceedmcsof aCixifenticeooEdQcmtMKiBlMea 
ments, BnBetis No 10 (latfiaa* Univvraity), pp. 12S-141. 



i 



[EFFICIENCY THROUGH MEASUREMENTS 47 
Bort of hocus-pocus of measurements into a marvelous 
genius who will learn anything set before him. A part 
of the trouble has been the fact that our specifications 
have not always been dictated by the needs of the future. 
Too often they have been framed by those who are think- 
ing in terms of the past. We wish to train tho children 
in our schools that they may earn efficiently and com- 
fortably their daily bread, but most of us want them to 
become more than skilled bricklayers, carpenters, iron- 
workers, stenographers, telegraphers, more than capable 
physicians, lawyers, and merchants. If practical efficiency 

I is all the American schools can do for a child, they are 
certainly far from efficient. 
Education will, of course, always need its poets, its 
artists, and craftsmen, as well as its managers and men 
of science, but it needs these also. There is no reason 
■why the artistic life should be impeded by measurements. 
Of course, we cannot subject all human life to mathe- 
matically accurate tests, certainly not in the near future. 
Human life is deeper than all our mathematics. 

IThis agitation for better-measured products is not con- 
fined to education. Commerce, industry, politics, philan- 
thropy, and other great fields of endeavor, are similarly 
agitated. Before the impact of oncoming generations, long- 
worshiped and long-cherished idols are falling. The old 
order yields slowly. It rejects the new as propaganda. 

I Thus the tide of conflict moves back and forth in the great 
fields of human endeavor, disclosing in the lull of struggle 
the inevitable process of change. 
5, Tests measure so small a part of intellectual life that 
ikey are not indicative of general ability. — An educational 
product is usually complex, and its measurement is as 
diEBcnlt as measuring an elephant. There are many char- 
acteristics to be taken into consideration. We do not make 
a complete measurement of the total fact, but we measure 



48 EDUCATIONAL MEASUREMENT 

the amount of some feature. Every measurement rep- 
resents a highly abstract and partial treatment of the 
product. Many critics who object to tests and scales do 
not understand thjs. An educational product invites hun- 
dreds of measurements. In making an automobile each 
part is measured in its own peculiar way. Linear measure 
is used for length, cubic measure, for volume, and the 
strength of the steel is tested by another unit. In exactly 
the same way the traits, characteristics, mannerisms, 
powers, and skills of an individual would have to be sub- 
jected to many kinds of measurements before their strength 
and quantity could be determined. If a teacher is a good 
one, she is good because of the presence or absence of cer- 
tain powers, traits, skills, and mannerisms, or she is good 
because she has these characteristics in a certain propor- 
tion. Dr. Thomdike has i)ointed out the fact that any- 
thing that is, exists in some quantity. If it exists in some 
quantity, it can be measured. Perhaps we cannot measure 
it to-day but we may be able to measure it some day. 

People have said, and rightly, that we can never deter- 
mine mathematically the degree to which a strong man 
and a noble woman influence for good the character of 
the pupils. But they overlook the fundamental truth that 
in education, as in other pursuits of life, character and 
efficiency go hand in hand. As school executives make 
practical application of the newer scientific tests, no fact 
stands out with more impressive distinction than that the 
teachers whose classes make the best record are the teachers 
who are the most truly successful in the shaping of char- 
acter. 

VII. Keeping an Accur.^te Record of All Methods Tried 

AND Peogress Made 

Measuring progress implies a starting point and a goal. 

The significance of this fact should not be overlooked in 



1 



i 



EFFICIENCY THROUGH MEASUREMENTS 49 

education. Attempting to measure progress in school work, 
when either the starting point or the goal is unknown, 
would be like attempting to measure the progress of one 
joyriding" about a city with no definite goal in mind. 
All that could be said is that he had been out one hour, 
two hours, or some other length of time. If, however, 
one were driving an automobile from Portland to Seattle, 
it becomes an easy matter to measure progress, because 
there is a definite starting point and goal. A de&iitfl 
point of departure and a definite goal is also needed in 
education if progress is to be measured. 

Till. The Cui/tivation of the Conftoence and the Utili- 
zation OF THE Support of the Public 
A new kind of confidence must be created on the part 
of the public in our schools. This confidence constitute! 
the capital with which the efficient school system must 
develop its dividends and activities. In devising educa* 
tional procedure, therefore, we must constantly have in 
mind the intelligent and public-minded portion of our 
itizenship. As education grows scientific it tends to be- 
lome less intelligible to tfie public. With a growing tech- 
lieal terminology the educational thinker tends to speak 
dialect difficult for the ordinary person to comprehend. 
The result is that as the education has become more scien- 
tific it has tended to isolate itself from the understanding 
of the people. 

Teachers Must Enow Why Sweeping Changes Are 
.Uiide. — It has been said that there is a growing tendency; 
lor some city superintendents in large and somewhat cen- 
tralized systems of public schools to make sweeping changea 
in schoolroom procedure without consulting, and what is 
lore serious, without attempting to convince their teachers 
if the necessity for such changes. There can be no true 
irofeasion of teaching where most of the members are 



so EDUCATIONAL MEASUREMENT 

required to carry out official orders mechanically and 
blindly, A clear understanding of underlying principles 
is essential to good teaching. In the art of administering 
to the intellectual and moral awakening of childhood, the 
spiritual worker should fully comprehend the meaning of 
the plan and should have a familiarity with the tools with 
which he works. Successful educational leaders are realiz- 
ing more and more the necessity of carrying their teaching 
staff with them in all progressive reforms, not simply as 
a matter of respect for the teachers, but as a matter of 
necessity in getting the essential work of the school done. 
It is necessary that the thought of the leaders be trans- 
mitted to the rank and file, to all trainers of youth, to 
parents as well as teachers. Our educational institutions 
are not made by imperial edicts and bureaucratic decrees 
handed down from educational experts. They must come 
from the people. It is not sufHeient that educational 
leaders alone should know the significance of a given re- 
form or movement. The pubUc must understand and 
accept the proposed policy. The intellectual channels be- 
tween leaders and followers, profession and populace, must 
be kept open. Both teachers and public must know the 
limitations of the school. The leaders can advance no 
farther with an educational movement than the public will 
endorse. Tests and measurements, however technical in 
character, must not have their significance clouded. The 
public wants to understand its educational system and 
what it is supposed to do. The ultimate worth of the 
principles and devices must be measured by the extent 
to which they are within the comprehension of the typical 
layman. Our aim must be the bulwarking of the cause of 
public education by a common confidence in the schools. 
This new confidence cannot be sustained on vague theories 
nor upon standards that are established in defiance to 
either scientific procedure or common sense. 



1 



J 



EFFICIENCY THROUGH MEASUREMENTS 51 

The Public Is Interestecl in Edneation.— If our schools 
are to have credit for their work, this new confidence must 
obtain. It is true that people as a whole are not inter- 
ested in pedagogy because they do not understand it, and 
they are not in sympathy with the pedagogues because 
they do not understand their subtle minds. But they are 
intensely interested in education. It is a mistake to think 
that the lack of educational progress may be attributed 
to public indifference and its consequences. The people 
are willmg to dip down into their pockets to almost any 
depth with reverence and, as a rule, without the slightest 
murmur. 

The teaching profession has done but little to demon- 
strate to the public that what it was doing was worth while. 
Results have always been recorded in vague, intangible 
adjectives such that the public could not understand. That 
the public has never taken an intelligent interest in its 
schools is not its fault, but that of the educators them- 
selves. For how can the public be expected to distinguish 
the true from the false when the leaders in the profession 
do not agree on the simplest fundamentals? 

The ultimate cause of the lamentably slow progress 
toward the introduction of educational reforms may be 
traced, therefore, beyond the province of the general pub- 
lic into the professional circle itself to an inner strife and 
turmoil consequent upon the uncertainties in which the 
entire problem of elementary education is involved. 

The so-called practical man is especially insistent in his 
demands for tangible results. 

With our methods of reporting results it is really won- 
derful how liberally the public has contributed to the 
support of the public schools and how little the adverse 
criticism has been. So liberal has the public been that 
to-day we are spending more on education than all the 
rest of the world. In criticism of our work the business 



52 EDUCATIONAL MEASUREMENT 



^ 



man doea not say that the child does not know anything 
at all about adding, for instance, but he says he does not 
add with the proper speed and accuracy. He does not 
say the child cannot understand English at all, he says 
he is not intelligent and cannot follow instructions. He 
docs not say the child cannot read, he says he reads too 
slowly and misunderstands what he reads. Why should 
we not, when we are dealing with purely mechanical 
skills like the above, give absolutely definite specifica- 
tions of what we expect and see to it that no child is 
marked "passed" until these definite specifleationa are 
reached ? 

The American people has had an abiding faith that 
the ^eat public schools would somehow eventually push 
everyone to the place for which his disposition, talents, 
and psychophysical gifts, prepare him; that they would 
discover the strong and weak points of the children who 
will some day occupy the places of responsibility. With 
that thought in mind, and without demanding a strict 
accounting for the vast sums so lavishly spent, they con- 
tinue to have faith in the schools the accounts of which 
they are unable to audit because only the debit side of 
the ledger is kept. Science has changed other professions 
and businesses so that the public is becoming accustomed 
to look for tangible results. Just as in the factory when 
a piece of raw material is put through a certain process, 
we look for tangible, measurable results, so in the school 
system, when we put a child through an educational proc- 
ess, we look immediately for measurable results. If the 
desired result is still lacking, then the process may be 
repeated or a new process or method tried. This is 
undoubtedly the only sane way to teach school. The old 
method was to put the child through the process and, with- 
out measuring the result, trust to luck that the desired 
goal was reached. If the boy became a man of prominence, 



EFFICIENCY THROUGH MEASUREMENTS 53 

the school-teaeher was liable to take considerable credit 
to herself for teaching him correctly. 

In this modern intensive life we cannot run the risk 
in waiting until adult life is reached to find out whether 
oar educational processes are bringing the proper results. 
It is then too late. Society demands that we find out 
immediately what changes are wrought by our educational 
processes. This information may be gained only by hav- 
ing definite units of accomplishment and scales and units 
of measurement. The public demands that the results of 
training be recorded in sensible scientific units and that 
they be given immediately after the child has been put 
through the educational process so that deficiencies may 
be provided for before the child has passed beyond the 
influence of the schools. 

Fields of Educational Tests and Measurements. — There 
are many fields into wliich tlie subject of educational tests 
and measurements may be divided. Seven are perhaps 
definite enough to deserve special mention. ThcBe are: 
(1) the measurements of intelligence ; (2) themeasurements 
of school achievement; (3) the measurements of the 
materials of instruction; (4) the measurements of the 
physical growth of school children; (5) the measurements 
of the money cost of education; (6) the measurements of 
iehool buildings; and (7) the measurements of retardation, 
acceleration, and elimination. 

Each of these fields is subdivided into narrower onei, 
and some are re-combined to produce other fields as, for 
instance, when measures of intelligence and school achieve- 
ments are pressed into service to determine, in part, the 
efficiency of a teacher. Space will not permit an exhaustive 
treatment of these fields in an elementary work of this 
kind, any one of which furnishes sufficient subject matter 
for lengthy analyses and discussions. Nevertheless, a some- 
what lengthy discussion of tlie first two fields mentioned 



54 EDUCATIONAL MEASUREMENT 

above will be given because it is in these fields particularly 
that the regular classroom teacher should acquaint her- 
self with the tools for measuring. It is in these fields 
that measurements come in direct contact with the leam- 
ing process in the regular classroom work. The other fields, 
especially the last three, perhaps pertain more to the 
administrative phases of education, and, while they affect 
the regular room-teacher, they are not quite so intimately 
connected with the regular recitation work. Fjve chap- 
ters will be devoted to a discussion of the first two fields, 
while the other five fields will be discussed in a single 
chapter. The dominant idea in this single chapter will 
be to map out and orientate the various divisions rather 
than to treat them exhaustively. In fact, the discussion 
will be limited to a few of the most characteristic phases 
* of each of these five fields. 



BlBUOGRaPHT 

1. Atkes, Leohasd p., "Making Education Definite," Second 
Annnal Conference on Educational Measurements, Bulletin No. 
11 (Indiana University, 1915), pp. 85-96. 

2. AsRES, Leonard P., "The Measurement of Educational 
Processes and Products," ibid., pp. 127-135. 

3. Black, W. W., "The Movement for Greater Economy in 
Education," ibid., pp. 7-12. 

i. BoBEiTT, John Franklin, The Curriculum (Houghton 
Mifflin Co., 1918). 

5. BpcKiNGHAM, B. R., "Efficiency Indices," Third Annual 
Conference on Educational Measurements, Bulletin No. 6 (In- 
diana University, 1916), pp. 85-118. 

6. BRTA>r, William Lowe, "Common Sense and Science in 
Education," Proceedings of a Conference on Educational Meas- 
urements, Bulletin No. 10 (Indiana University, 1914), pp. 8-9. 

7. Bagley, W. C, "The Determination of Minimum Essentials 
in Elementary Geography and History," National Society for 
the Study of Education, Fovrteenth Year Book, Part I, pp. 
131-146. 



^ 



I 



EFFICIENCY THROUGH MEASUREMENTS 55 

8. CuBBERLETy Ellwood P., ''The Signifieance of Edacational 
Measurements/' Third Annual Conference on Edacational Meas- 
urements, Bulletin No. 6 (Indiana University, 1916), pp. 6-20. 

9. Charters, W. W., "Scientific Curriculum Construction," 
Sixth Annual Conference on Educational Measurements, Bul- 
letin No. 1 (Indiana University, 1919), pp. 78-94. 

10. Monroe, Walter S., "The Next Step in Educational 
Measurements,'' ibid,, pp. 94-103. 

11. Rice, J. M., Scientific Management in Education (Pub- 
lishers Printing Co., New York, 1913). 

12. Busk, Robert B., Introduction to Experimental Education 
(Longmans, Green & Co., 1913). 

13. Thorndike, Edward L., '^nits and Scales for Measuring 
Educational Products," Proceedings of a Conference on Educa- 
tional Measurements, Bulletin No. 10 (Indiana University, 1914), 
pp. 128-141. 



CHAPTER III 

THE MEASUREMENT OF INTELLIGENCE 

General Statement of the Problem. — It is a platitude 
to say that individuals differ from one another in intellec- 
tual aehieveraenta. We have known this from time 

immemorial. The reasons why they differ from one an- 
other are known only in part. It is known that one's 
intellectual ability depends upon two factors: (1) native 
capacity, which is a physical inheritance and therefore 
beyond the control of the school; and (2) environmental 
conditions, a part of whicli may be directly controlled by 
the school. It is the purpose of this chapter to disetiss 
some of the problems and technique that have to do with 
measurements of the first factor. The particular phase 
of this factor that we desire to measure is that known as 
"general intelligence." The subject is naturally divided 
into a number of major problems each of which is sub- 
divided into a number of lesser ones. Some of the major 
problems are: (1) What is meant by general intelligence? 
(2) What shall be the nature of the tests to measure 
general intelligence? (3) What shall be the units of 
measurement? (4) What .shall be the methods of scoring? 
Many lesser problems will arise as the discussion proceeds. 
Before discussing these problems, a brief statement as 
to the work which was being done in experimental 
psychology and psychiatry (that branch of neurology 
which treats of mental disorders and of the organic changes 
associated with them} that led up to the attempts to 
m 




THE ME.\SUREMENT OF INTELLIGENCE 57 

I measure intelligence will help to clarify the situation and 
throw some light on the questions as to why this move- 
ment came into existence and why certain tests were used 
in the early stages of the work. Space will not permit 
I more than the very briefest statements as to general status 
[ of psyehologj' prior to the attempts to measure intelligence. 
Until very recent times it was generally conceded that 
I psychical phenomena could not be subjected to quantita- 
[ tive measurements. Real experimental psychology came 
I into being within the memory of men now living. 

The honor of founding quantitative psychologj' belongs 
to Gustav Theodor Fechner (lSOl-1887). It was as late 
as 1860 that he published his Elemente der Fsychophysic 
which brought together scattered observations from 
astronomy, physics, and biology, which, together with his 
own elaborate observations in physics, mathematics, and 
physiology, were placed at the service of mental measure- 
ments. A student who would understand the principles 
and general methods of mental measurements must still 
go to school with Fechner. 

Wilhelm VTundt (1832-1920) did more to encourage and 
inspire students to work in this field than any other modem 
scientist. He established the first psychological laboratory 
at Leipsie in 1878. "When he began his experimental work 
in psychology, there was little save the psychophysical 
lawB announced by Fechner, reaction times, the esperi- 

» mental physiology of the senses, and the early studies in 
brain localization. 
At the time Wundt established his psychological 
laboratory, the general field of psychology was beginning 
to be divided into various fields each with its own methods 
of investigation. There were subjective psychology which 
rdied wholly on inner perception, and objective psychoU 
ogy which attempted to perfect and to supplement inner 
perception by objective means. Objective psychology was 



1 



I perception 



J 



EDUCATIONAL MEASUREMENT 



1 



again divided into:^ (a) txperimentai or pkysioJoffiad 
psyckology, which brought inner perception under the cen- 
tral of experimental appliances; and (6) social psychology, 
which sooght to derive gaieral laws of psychological de- 
velopment from the objective products of the coUeetive 

Wondt re&Uzed that if psychology was to adx'anee it 
most follow the indn«tive path. Two indactive methods 
were available: (1) 71i« mttkod of statistics, which is 
indirect and bean primarily on the practical with little 
emphaas on theoretical psychology. This is the method 
that fiimiahes p^xhology with its iaicts. (2) Tke meffcod 
of ezptrimtati, which, as Wondt say^ is a principle 
appUcaUe ova* the whole range of jeytiKiogy. It wsa 
the latter m^od in which Wmidt was primaii^ inta<- 
Gsted and the one that ehidr concerns as here. 

The physiologist E, H, Wcbo-, desiring to detennine 
the power of the skin and of the mmde-scrase to dia- 
eriminate betweai wnghts, began as eaz^y as 1830 to 
experiments in the fidd of seise perceptkns. Vdis d*<. 
vised scherafis for measuring the rdatiai 
intcsisity of stimnli and tbor ecnnqMndiBg 
which rdations me dcngnated by Fedner a 
Imes. 

UPhot ecxtain p^fcbolopeal tests were devised to in- 
vestigate ^Mcial mental foaetMB^ it was found tiat tke 
lesohs obtained iron these tests varied with the degree 
of inteOigawe of die subjects. Vhoi tlus d i se uvo y was 
made, these tests were dhrerted frora tbar wi^jiiil p^ | 
pose and were enph^red in «K^i*iM«*g aadowiaent « 
era] intdligeiM«. For instance, ib invvsEtiptMB ^ 
fbund titat in settsorr dismaunatiin tiie most i 



rTHE MEASUREMENT OF INTELLIGENCE 59 
mbjects had the low^ thresholds. It was eoncloded there- 
from that all that was necessary was to measure the sub- 
ject's power of sensory discrimination, sach as the sensi- 
tivity to difCerenees in pitch, or the ability to distinguish 
between the length of two lines, in order to determine 
the degree of intelligence of an individnal. It was thought 
also that tests in memory and association would give results 
indicative of the degree of intelligence. E^irther investiga- 
tion, however, proved that these assumptions were not 
tme. It happens very frequently that an exceptionally 
I good memory js possessed by one with low intelligence. -^ 
From this humble beginning there has been organized 
a vast body of methods and resnlts requiring several hun- 
dred standard experiments and a great army of experts 
I who spend most of their time in the laboratory employing 
these new methods of observation and introspection. 
Why Work in Psychological Measurements Was Be- 
[ tarded. — The intrinsic difficulty of psychological measure- 
I ments is nndoubtedly largely responsible for their tardy 
I advent and lack of progress. The recesses of the mind 
[ were apparently inaccessible. But there are other reasons 
I why progress was not made in this science. 

Descartes had drawn a sharp line of demarcatioQ 
I in popular thought between the natural and the men- 
I tal sciences. The former were considered quantitative, 
I and hence measureable, the latter, qualitative, and not 
I measureable. This popular "common-sense" point of view 
llms had the weight and inertia of a settled tradition. 
Itnmanuel Kant (1724-1804) reinforced the "common- 
l.«ense" view from the fields of philosophy. He declared, 
I in 17S6, that psychology never could obtain the rank of 
I pure science. To overcome the effects of a dogmatic state- 
I ment of a philosopher like Kant was a difficult thing to 
Iido. Of course, there were plenty of ideas and suggestions 
i to what might be done. The path of scientific progress 



60 EDUCATIONAL MEASUREMENT 

ia littered with brilliant suggestions. It is so easy to 
suggest but so difBcult to grasp the suggestion and carry 
it to its conclusion. 

Not only did mental measurements get a late start but 
their very nature makes progress slow. Like political 
constitutions, new branches of knowledge are not made 
but must grow. Improvements will be slow and discus- 
sion is one of the means by which they are made. 

Effects of Wuudtian Laboratories. — Just as the Binet 
tests have occupied the center of the stage in the field of 
intelligence measurements, so Wnndt and Wundtian 
laboratories have been the center of the movements in 
experimental psychology for the last fifty years. Many 
of "Wundt's students, Titchener, Hall, Kraepelin, Muller, 
Cattell, Meumann, and others, have gone far beyond him 
in many lines. Dr. Stanley Hall founded the first 
psychological laboratory in the United States at Johna 
Hopkins University in the eighties. Within a few years 
Wundtian laboratories in experimental psychology were 
established at Philadelphia and at Columbia by Cattell; 
at Cornell by Titehener ; and at Harvard by Miinsterbnrg. 

Beginning in the small and more accessible fields of 
sensation, experimental methods gradually moved up 
toward the more complex problems of association, atten- 
tion, voluntary movements, will, the higher thought proc- 
esses, and even feelings. But little has been done, how- 
ever, in the last-named field. Of all mental processes, the 
feelings, the affective states, are the most baffling and least 
amenable to measurements. 

As early as 1912 fifteen of the more progressive aaylumi 
were applying Wundtian methods to some extent in the 
study of insanity. Wundt realized, as few scientists did, 
that progress in a science is bound up with progress in 
the methods of investigation. Every new instrument used 
is followed by a series of new discoveries, and modem 



1 



I 



I 



THE MEASUREMENT OF INTELLIGENCE 61 

science itself originated in a revolution of methods in the 
hands of Bacon, Galileo, and others. 

Psychology made little progress from the time of Aris- 
totle to Wundt because there is no scientific field so 
crowded with presuppositions and prejudices and so 
barren of scientific methods of investigation. It is the 
spirit of Wundt more than anything else, perhaps, that 
has led to the recent improvement in controlled observa- 
tional methods in the study of animal instinct, from 
tropisms up to tests of intelligence. 

The methods and tools developed and employed in the 
psychological laboratory were soon to fall into the hands 
of those interested in mental development not only from 
the standpoint of applied science, but also with the idea 
of helping some of the more unfortunate ones in our 
society. Men motivated by altruistic and philanthropic 
idea^ sought to better mankind by the application of 
science in the field of intelligence. It was the psychiatrists 
dealing with abnormal adults who first wanted to test 
intelligence. The movement was at first wholly within 
the field of psychopathology. These men devised tests 
and systems of tests. By far the greater portion of these 
testa took on the character of questions and qualitative 
tests rather than that of quantitatively gradable tests. 
Their methods were open to many criticisms. The method 
of determining intelligence was to test a few individuals 
a great many times with one or more tests and then apply 
the tests to a great many individuals only once, or rarely 
more than two or three times. If it was found, on the 
whole, that persons known to possess more than average 
intdligence obtained better averages than the less intelli- 
gent, it was assumed that the methods employed would 
answei- for the testing of intelligence. Most of the early 
testing was of this sort. Their interest was primarily with 
abnormal individuals. Whatever testing the psychiatrist 




J 



02 EDUCATIONAL MEASUREMENT 

did on normal individaala was simplf to establish norms 
for tho mottsuroineiit of abnormal adults. Whatever data 
wore gatliercd in referen<;e to normal individuals came 
ttbout on ft "by-product" of this process. They knew little 
about standards for normal adulu with which the per- 
formances of abnormal subjects were to be compared, and 
they knew alwolutcly nothing: about standards for children, 
What is more, one normal standard is not enough. With 
children every age-level must have its own standard. The 
mnimHudc of a defect in intelligence in a nine-year-old 
child can be determined only by comparing it with a 
normal nine-year-old intelligence. 

In the purely psj-chological fields men were studying 
tho )»8ychology of testimony, optical illusions, association 
experiments, lachistoscopic experiments, the learning of 
nonsense syllables, etc. The first tests made by psycholo- 
gists were not designed as measures of intelligence. They 
seem to have arisen as a direct result of the individual 
diffcrcnceK no1«l in the laboratory by the experimental 
psj-ohologists. At first these individual differences were 
a distinct hindrance because they made the establishment 
of psychtdogical laws diffienlt. But psycholc^sts became 
interested in them for thoir own sake, and once this oc- 
currenJ we had the birth of the test desired to measure the 
mental differences betwecai individnaK The first tests were 
noneerocd with measurements of specific ""faculties" or ca- 
paeitln. T^ey were tests of different mental processes or 
«t ^Armt states of oonscioufness. At first t^ tcndaicy 
1P«S t» ^{Kire individual difference of pupils. Xow the 
jta»l«i is to iBdividualize instruction which wiH acc^i- 
IMfte Ane ^Cerences. The opimon is nnt unsztimons 
UaA differenMS ^oald be aeeentosied. bat tbe tendcney 
towwds isdiridnal instmcrion ■will 
about this result. 

Man; persosts vere nsiii^ dwM lattadnl vi^mm 




THE MEASUREMENT OF INTELLIGENCE 63 

SG tests of intelligence. The chief hindrance both to the 
psyehiatrist and to the psyciioiogist in attempting to 
measure intelligence up to this time was, perhaps, a lack 
of definite working hypotheses as to what intelligence really 
is. The ordinary school examinations would give them a 
notion of the pupil's knowledge and of his external 
accomplishments, but they do not afford an index to his 
inner endowment, his mental maturity and power. What 
this endowment is, and a means for measuring it, were 
obviously the next steps in the scientific procedure of men- 
tal measurements. The more or less blind probing with 
poor methods for an unknown characteristic brought poor 
results. They realized that tests must be selected in 
accordance with rather definite hypotheses of the exact 
nature of intelligence. Therefore definite hypotheses of 
just what intelligence is, with better and more scientific 
measuring instruments, were the crying needs of the hour. 
It is here that work began in earnest on the measurement of 
intelligence. 

What Is General Intelligence? — The following defini- 
tions of intelligence can, of course, be nothing more than 
working hypotheses in this field. We shall note a number 
of definitions of this kind. Stem" defines intelligence as 
"a general capacity of an individual consciously to adjust 
fcts thinking to new requirements: it is general mental 
adaptability to new problems and conditions of life." He 
thinks this definition clearly differentiates intelligence 
from other mental capacities such as memory, genius, 
talent, etc. Adjusting one's thinking to new requirements 
obliterates the effect of memory; forcing the examinee to 
adapt himself to the performance of problems already set 
differentiates intelligence from genius, the nature of which 



'William Stem, The Psychological Methods of Testing InteUigence, 
tiaiiBlatedby Guy Montrose Whipple (Warwick & York, 1914), p. 3. 



64 EDUCATIONAL MEASUREMENT 

is to create the new spontaneously. The fact that it is a 
general capacity distinguishes intelligence from talent, the 
thief characteristic of which is the limitation of efficiency 
in one kind of content.^ 

Binet's conception of intelligence emphasizes three char- 
acteristics of the thought process: (1) its tendency to 
take and maintain a definite direction; (2) the capacity 
to make adaptations for the purpose of attaining a desired 
end; and (3) the power of auto-eritieism,* In his earlier 
work Binet helieved that the essence of intelligence was 
capacity to adjust the attention.'^ 

Meumann's conception of intelligence is not quite clear. 
At times he lays great stress on the understanding of the 
abstract as the root of intelligence. He makes use of the 
retentive powers of memory in the learning of abstract 
words; he holds that the power of independent and crea- 
tive elaboration of new products out of the material given 
by memory and the senses is a manifestation of intelligence. 
In practical affairs intelligence, according to Meumann, 
means the ability to avoid errors, to adjust one's self to 
his environment, and to surmount difSculties. He mak^ 
extensive use of what is known as the "Masselon experi- 
ment," wliich gives the subject a number of words that 
are to be used in a sentence. He tiiiuks this is a reliable 
index of the maturity of the associative processes. 

According to Ebbinghaus, the essence of intelligence lies 
in comprehending together, in a unitary meaningful whole, 
impressions and associations that are more or less inde- 
pendent, heterogeneous, or even partly contradictory. 
"Intellectual ability consists in the elaboration of a whole 
into its worth and meaning by means of many-sided com- 

' Ibid,, pp. 3-4. 

' "L'intelligence des imbeciles," L'Annee Psychologique, 1909, pp. 
1-147. 
*Btern, op. eil,, p. 17. 



1 



THE MEASUREMENT OF INTELLIGENCE 65 

iination, correction and completion of nwrnerous kindred 
associations." ' He thinks that every true instance of intel- 
lectual ability may, in the last analysis, be reduced to an 
act of combining. It is a combination activity. To test 
the maturity of this combining activity, he gives the sub- 
ject sentences broken up into parts with gaps in them, 
words left out, and he asks the subject to supply the miss- 
ing parts so as to make the sentences read correctly. The 
same principle was used by Healy in the Picture Com- 
pletion Tests. It is also used in many other experiments. 

Zeihen attempts to measure intelligence by the use of 
tests of retention, development, comprehension, and gen- 
eralization. 

Is InteUigence a General Faculty of the Mind?— At 
the present time, the term general intelligence is commonly 
understood to mean an innate ability or group of abilities 
that lie at the basis of the acquired intelligence of an 
individual. We know that intelligence itself is not in- 
born, but only the capacity to become intelligent. Whether 
general intelligence signifies a single inborn capacity which 
functions in all situations, or a large number of specific 
capacities, more or less related, which enable an individual 
to acquire intelligent behavior in many different activities, 
is a question that iias not been settled by psychologists. 
Spearman, Hart, and Burt explain innate intelligence as 
a "general common factor." Spearman has developed a 
mathematical formula which shows the correlation between 
the various faculties of the mind, and hints at a general 
common factor in all mental performances, which is known 
popularly as "general intelligence." 

Burt, employing in his investigations the methods of 
Spearman, concludes that there is a general function- 



1 



J 



V 



66 EDUCATIONAL MEASUREMENT 

a greatest common measure — permeating, to a greater or 
less extent, the various special functions measured by his 
testa. He tiijnks that the measareraenta obtained are 
measurements, more or le,^s indirect, of a single capacity, 
and not determined purely by different capacities in dif- 
ferent cases; that the idea of an all-around mental efS- 
cieney applicable in many directions is a legitimate 
conception/ 

Thomdike does not endorse the idea of the existence of 
the common factor denoted by the term "general intel- 
ligence," Relative to this point he says:' 

This doctrine requires not only that all branches of inteUeetual 
activity be positively correlated, which is substantially true, but 
also that they be bound to each other in all casea by one common 
factor, which is false. The latter would require that no two 
intellectual shilities or branches of intellectual activity should 
be more closely related to each other that) to the fundamental 
function by which alone they are supposed to he related. . . . 
But unless one arbitrarily limits the meaning of "aU branches 
of intellectual activity" so as to exclude a majority of those so 
far tested, one finds traits closely related to each other but with 
their common element only loosely related to the common element 
of some other pair. . . . The mind must be regarded not as a 
functional unit, nor even as a collection of a few general facul- 
ties which work irrespective of particular ronterial, hut rather as 
a multitude of functions each of which involves content as well 
as form and so is related closely to only a few of its fellows, 
to the others with greater and greater degrees of remoteness. 

Menmann and others criticize the idea of a general faculty 
known as general intelligence and cite many reasons why 
this hypothesis is contrary to the facts. The arguments 
are too long to persent here, and suffice it to say that most 
psychologists deny the existence of this general faculty. 



J 



THE MEASUREMENT OF INTELLIGENCE 67 

We might go on at length giving definitions of intelli- 
gence, which are, at best, nothing more than working 
hypotheses; nevertheless, such hypotheses serve as guides 
in our attempt to measure an inheritance which conditions 
all our mental achievements. 

Inability to Define Intelligrence Accurately Does Not 
Frobibit Measurements. — It may seem at first thought \ 
that it would he impossible to measure a thing that has \ 
not been defined. This, however, is not the case. Stem \ 
points out that electromotive force was measured long be- 1 
fore electric currents were well understood; that many I 
diseases were diagnosed and successfully treated when very / 
little was known of their real causes. The whole science/ 
of chemistry, for instance, is built up on the supposition 
that matter is composed of molecules and atoms, and the 
definitions given for them are at best but working hypo- 
theses. So the handicap of being unable clearly to define 
intelligence may not be as great a hindrance as it might 
seem. The above definitions of intelligence differ more be- 
cause of different points of view than because men do 
not agree as to what intelligence is. The tests that we shall 
now describe attempt to approach thjs capacity from many 
angles; hence the difference in the tests. 

Since the Bmet tests are by far the most important, 
both as to origin and as to the fundamental principles 
used, a short historical sketch of their development will 
throw much light on attempts at measuring iutelligence. 
After this brief historical statement, we shall again refer 
to the nature of intelligence in the discussion of the types 
of tests used to measure it. Then we shall give some of 
the moat modem conceptions of it as discussed by writers 
in this field. 

The Binet Tests. — The Minister of Education in France V^ 
in 1904 decided to separate the subnormal from the normal ff 
children in the public schools of that nation. For thii 



68 EDUCATIONAL MEASUREMENT 

Aiifficult task he called upon Alfred Biaet to devise a series 
/o£ tests which might be used for that purpose. Biaet had 
had a wide experience with the education of children. He 
had been president of the Societe Libre pour I'Stude de 
I'Enfant for a number of years. At the suggestion of 
many teachers he had organized a committee for the care 
of abnormal children which initiated various investigations 
relative to backward children. He had worked with chil- 
dren for many years studying their peculiarities and 
proclivities, 

Binet consented to undertake the difficult task and, call- 
ig to his assistance the physician, Thomas Simon, devised 

aeries of 30 tests, the chief purpose of which waa to 
.etect subnormality. After trying these testa on 203 school 
iihildren in Paris, both Binet and Simon came to the con- 
clusion that it was possible to devise a series of tests that 
would not only detect subnormality but also serve as a 
definite measure of mental unfoldment. "With this thought 
in mind, they devised a scale consisting of 54 tests which 
ithey published in 1908. This scale was revised and re- 
[published in 1911. 

Before taking up the problems confronting Binet and 
Simon in malting an intelligence scale, a tabular sjnopsis 
of the 1911 Revision as adapted to American conditions 
will be presented for purposes of reference. A general 
description of the scale will then follow with an interpreta- 
tion of a number of the descriptive terms used. 



Tabular Stnopsis op the Binet-Simon Scale, 1911 Edition 

Age 3: 

1. Points to nose, eyes', and mouth. 

2. Repeats two digits. 

3. Enumerates objects in a picture. 

4. Gives family name. 
, Repeats a sentence of six syllables. 



THE MEASUREMENT OF INTELLIGENCE 69 

Age 4: 

1. Gives his sex. 

2. Names key, knife, and penny. 

3. Repeats three digits. 

4. Compares two lines. 

Age 5: 

1. Compares two weights. 

2. Copies a square. 

3. Repeats a sentence of ten syllables. 

4. Counts four pennies. 

5. Unites the halves of a divided rectangle. 

Age 6: 

1. Distinguishes between morning and afternoon. 

2. Defines familiar words in terms of use. 

3. Copies a diamond. 

4. Counts thirteen pennies. 

5. Distinguishes pictures of ugly and pretty faces. 

Age 7: 

1. Shows right hand and left ear. 

2. Describes a picture. 

3. Executes three commissions, given simultaneously. 

4. Counts the value of six sous, three of which are double. 

5. Names four cardinal colors. 

Age 8: 

1. Compares two objects from memory. 

2. Counts from 20 to 0. 

3. Notes omissions from pictures. 

4. Gives day and date. 

5. Repeats five digits. 
Age 9: 

1. Gives change from twenty sous. 

2. Defines familiar words in terms superior to use. 

3. Recognizes all the pieces of money. 

4. Names the months of the year, in order. 

5. Answers easy "comprehension questions." 
Age 10: 

1. Arranges five blocks in order of weight. 

2. Copies drawings from memory. 

3. Criticizes absurd statements. 

4. Answers difficult "comprehension questions." 

5. Uses three given words in not more than two sentences. 



70 EDUCATIONAL MEASUREMENT 

Age 12: 

1. Resists suggestion, 

2. Composes one sentence containing three given words. 

3. Names sixty words in three minutes. 

4. Defines certain abstract words. 

5. Discovers the seaae of a disarranged sentence. 
Age 15: 

1. Repeats seven digits. 

2. Finds three rhymes for a given word. 

3. Repeats a sentence of twenty-sis syllahles. 

4. Interprets pictures. 

5. Interprets ^ven facts. 
Adult: 

1. Solves the paper-cutting test. 

2. Rearranges a triangle in imagination. 

3. Gives differences between pairs of abstract terms. 

i. Oives three differences between a, president and a king. 
5. Gives the main thonght of a selection which he has heard 

The Binet-Simoii scale is an instrument for measuring 
mental maturity. By maturity we mean the development 
of native capacity as a whole by growth, training, and 
environment. Mental growth is a gradual increase of 
capacity for learning which comes as a result of the devel- 
opment of the nervous system apart from all training.' 
By native capacity or endowment we mean the special 
capacity for functioning with which nature has provided 
the individual. Inborn capacity manifests itself only 
through learning. An individual bom with a great 
capacity to become intelligent, hut denied the opportunity 
to learn, would possess no intelligence. Intelligence mast 
be acquired. 

The tests in the Binet scale are arranged in progressive 
steps of increasing difficulty, each higher step involving 



J 



THE MEASUREMENT OF INTELLIGENCE 71 

the more tardily appearing functions, snch as reasoning, ^ 
complex eomparbons, the associative functions, and the 
lihe, which depend directly on the maturation of the native 
capacities. The aulhors of the scale worked on the assump- 
tion that the intellectual ability of children of a given age 
tended to approach a relatively well-marked norm. The 
tests in each age-group were selected on this basis. The 
mental scale is merely the grouping together of individual 
tests in order to give a more general picture of the mental 
make-up of the individual. Binet originated the idea of j 
grouping tests for estimating intelligence. For a long time 
he had been interested in the question of tests for various 
specific abilities. His work gradually led him to a study 
of individual cases, and, in summing up the psychological 
characteristics of individuals, as revealed by the mental 
teste, he came upon the idea of using a number of tests 
as a measure of the individual's capacity. In addition to 
this, his theoretical speculations as to what the tests were 
testing, led him to the conclusion that "attention" and 
"adaptation" were at bottom the chief factors that dis- 
tinguished the intelligent from the unintelligent. The 
practical situation presented to Binet of separating thei 
normal from the subnormal children of France called forth [ 
the first actual group of tests for differentiating intelligent ' 
and unintelligent children. He was called upon to dis- 
criminate between the normal and backward child, and 
the question was not whether this or that child was better 
in such a specific thing as memory or imagination, but 
whether the child was, in general, weaker in his intellectual 
endowment than the average child of his age. He there-, 
fore discarded the individual tests for specific ability and] 
took a group of tests whicli seemed to cover in general I 
the chief psychological characteristics that go to make up i 
intelligence. It was Binet, therefore, who really hlazed\ 
the trail through the jungle of mental measurements and ' 




72 EDUCATIONAL MEASUREMENT 

left us a path which leada to the general abilities of a 
I child's mental life. As the norm or standard of intelligence, 
I he took what the average child at eaeli age could do. 

These two points, the use of the group tests and the aver- 
age performance at each age as a standard of measurement, 
form the basic principles upon which all of our measuring 
scales of intelligence now rest. 

The Binet Tests Had Many InaovatioBS. — Binet at- 
tempted to measure the higher thought processes instead of 
the simpler ones such as sensory discrimination, reaction 
time, rctentiveiiess, and the like. He abandoned the older 

("faculty psychology" which had given direction to most 
of the testing up to this time and set problems for 
the reasoning powers — problems that provoke judgment 
about abstract matters and problems that draw on the 
discriminating powers of the examiuee. If the faculties 
of the mind were separate and distinct so that they might 
be measured singly and then summated to get an idea of 
general intelligence, the problem might be simplified. But 
Ithey are so interwoven and intertwined that they cannot 
be isolated for measurement. Memory, for instance, cannot 
be tested separate and apart from attention and other 
faculties because of the interfunctioning of these faculties. 
Of course, most mental phenomena elude absolute measure- 
ments in terms of amount ; hut they can be measured in 
terms of their relative magnitude. Although we cannot 
equate them, we can subject them to quantitative treat- 
ment. 

The Constituent Functions of Intelligence Mtist Be 
Brought into Play. — Just as the total character of an 
individual cannot be determined by judging a single char- 
acteristic of his behavior, so his general intelligence cannot 
be determined by measuring the strength of one phase of 
his mental life. We must test many processes before & 
comprehensive idea of an individual's general intelligence 



J 



[ 



THE MEASUREMENT OF INTELLIGENCE 73 

can be gained. For instance, if a completion test is used, 
would one be safe in saying that, because the child has 
the power to fill blanks in which words had been left out 
of a sentence or paragraph, he has a high degree of in- 
telligence, whereas a child who cannot do this does not 
possess intelligence to the same degree? Or would a child's 
ability to repeat a number of words or figures be an in- 
fallible index of his general intelligence? 

General intelligence is not simply the functioning of 
individual processes as such. It depends on their correla- 
tion and interfunctioning. While there was much criti- 
cism of the single tests and systems of tests used by the 
psychiatrist, which tested only one phase of the mind and 
attempted to judge the whole from this single character- 
istic, there is also much criticism of the Binet tests on 
the same general principles. This will be discussed more 
fully under the criticisms of the Binet scale. 

Binet attempted to make a scale that would give a kind 
of composite picture of the individual's mentality by test- 
ing the strength of a limited number of these processes. 
The critics say that the scale is much better than the single 
tests of the psychiatrist but that it still fails to test enough 
characteristics to make the picture anything like complete. 
They say that because of the limited number of character- 
istics tested a lack of development of one trait, or an over- 
development of another, tends to give a general result not 
in keeping with the facts. 

The degree of interfunctioning between the various 
processes may be illustrated as follows: If a child's atten- 
tion is nil, or nearly so, then his perceptive power does 
not focus long enough to produce the cortical alterations 
giving memory. Therefore, attempts at measuring memory 
also measure perception and other abilities of the mind. 
.In this way a due to the interfunctioning and interde- 
pendence of the various processes is obtained. 



J 



74 EDUCATIONAL MEASUREMENT 

The Kind of Mental Fimctdons Brought into Play.— 
Binet differed from the psychiatrist and other psychologiBta 
in the type of tests he designed. His tests were designed 
to test the higher and more complex thought processes 
saeh as reasoning power, abstract judgments, and the like, 
instead of attempting to measure sensorj' diseriniination, 
the rapidity of reaction, and other lower and less complex 
powers. Up to this time it was considered impossible to 
measure these complex processes. The old "faculty 
psychology" had given direction to the earlier testing. It 
seemed to both the psychiatrist and the psychologist that 
sensory discrimination, memory, attention, and other 
processes of the mind, could be measured better if taken 
separately tlian if an attempt were made at summating 
the general restilts of all the different aspects of intelli- 
gence. This might be true were it not for the innerfunc- 
tioning of the various aspects of the mind which makes 
it impossible to completely isolate any one process for 
examinaticHi. Binet realized this fact and, instead of at- 
tempting to measure one aspect of the mind, be ondertook 
to ascertain the gtneraj level of intelligence. 

It is now generally conceded that such elementary 
processes as sensory discrimination (like distinguishing be- 
tween two shades of color), reaction-time (as the rapidity 
vith which an individual can tap), and ^'isual acuity have 
bat little to do with the higher thought processes, sdnce 
many fe«ble-minded children have keen i>owers in sensory 
discrimination. It is apparently the power of comprehoi- 
sion, abstraction, and the ability to direct thought that 
separates the normal from the subnormaL Hence the other 
tests were very largely discarded by Binet. He sought to 
bring into p!ay only those mental processes thought to be 
so closely concerned with ntiF ruiUve abHity that they 
would give him an insight into this native endowment. If 
th« percentages of passes did not increase in going from 



1 



J 



THE MEASUREMENT OF INTELLIGENCE 75 

yonnger eliildren to older ones, the test was considered unfit 
Binee it did not indicate the d^ree of maturity of the 
developing intelligence. Or again, if children known to 
be bright passed a certain test more frequently than chil- 
dren known to he dull, such a test was considered satis- 
factory. 

Establishment of the Zone of Normality. — Binet was 
confronted with a practical problem, that of separating the 
subnormal from the normal children in France. It is ob- 
vious that he could not detect the subnormal children unless 
he knew what was meant by normal ones. He had no 
precedent to follow in this difficult task. No one had deter- 
mined what mental equipment a child must have to be 
considered normal. It was not sufficient to determine the 
mental equipment of a group of children of one age and 
take that as the standard, because the standard for each 
age had to be established jf the degree of maturity of the 
child's native mental endowment was to be determined. 
It was therefore necessary to take a group of children for 
each age and determine normality for that age. 

The idea of the age-grade method for measuring intelli- 
gence did not come to Binet until he had experimented 
with tests for fifteen years. The provisional scale published 
in 1905 did not employ the age-grade method but con- 
sisted of 32 tests roughly arranged in the order of their 
difficulty. Since no account is given as to how he came 
to employ the age-grade method, the supposition is that 
in working with the data gathered from the provisional 
Bcale made in 1905 he hit upon the age-grade idea. Suffice 
it to say that the age-grade idea was relatively complete 
in his 1908 scale. 

Three problems confronted him in this task: (1) He 
most arbitrarily or otherwise choose for each age a group 
of children that be considered normal, (2) He must 
determine the general intelligence of each of these groups. 



J 



76 EDUCATIONAL MEASUREMENT 

(3) He must determine the boundary lines -whieh sep- 
arate the zone of normality from the supernormal on the 
one hand and from the subnormal on the other. 

In the solution of the first problem, if he were to choose 
the children from the higher classes in France his standard 
■would be too high, assuming that the children from the 
upper classes had a larger stock of native mental endow- 
ment than those from the lower classes. On the other hand, 
if the children selected were known to be intellectually 
inferior, his scale would not be a fair measure of what a 
child's mental equipment really is. He apparently made 
his selection on the assumption that the normal child is 
the so-called average child; the common, ordinary child; 
the child who has that mental equipment possessed by the 
greatest number of children of that particular age. With 
the help of some of the teachers he then chose a group of 
children for each age which he thought represented the 
average child. His next problem was to determine the 
general intelligence of the groups chosen. This was done 
by a series of tests. He had tests of varying difSenlty 
which he gave to each group. He found the number of 
children in each group who were able to pass the tests and 
thus, tentatively at least, determined the amount and de- 
gree of maturity of what he considered the native mental 
endowment of normal children. His method of determin- 
ing what tests should belong to a particular age-level will 
be discussed later. 

In order to further clarify the exact problems Binet had 
before him a diagrammatic representation of the range of 
mentality is given in Fig. I. In the great range of 
mentality from the idiot on the one hand to the genius 
on the other there seems to be a gradual increase of in- 
, telligence, and somewhere near the middle of this range 
] is a section which may be designated as the zone of 
normality. 



] 



i 



THE MEASUREMENT OF INTELLIGENCE 77 



We may represent the range and distribution of men- 
tality in Figure I thus: Let the left end of the line AB 
represent the lowest stages of mentality and the right end 
the highest. The curve is known as a normal probability 
curve, normal frequency curve, or normal distribution 
curve. It will be described more fully in a subsequent 
chapter. The number of cases having the various degrees 
of mentality is represented by the height of the curve above 
the base line AB. Thus, at the left the small number of 
people with extremely low mentality is represented by the 

M 




ILLUSTRATING THE DISTRIBUTION OP MENTAL ABILITY AO 
CORDINQ TO THE NORMAL PROBABILITT CURVE 

line CD. As the mental powers increase, the number ef 
cases also increases until it reaches a maximum at point M. 
That is, a line drawn through point M perpendicular to 
the line AB is the longest line that can be drawn within 
the figure, and thus represents the degree of intelligence 
possessed by the largest group of people. Not only is it 
theoretically true that the distribution of intelligence is 
according to the normal frequency curve, but an actual 
count taken of any considerable number of unselected 
people shows that the distribution conforms remarkably 
well to the normal distribution curve. Thus the figures of 



78 EDUCATIONAL MEASUBEUENT 

the BoTsl CflmnuHum of Grcst Britain, Tising the udai- 
ectnwtnic eritciion of mptitfil dcfiidmcy as k huuuuks, dww 
the diltjibatioi) st tlie low aid of the carre for ceitiiB 
distrirti wamyeA to be: 585 tdiots, 1,007 imbeciles, and 
9^28 feeUfr-rainded (morons)'* This, of eoaree, is not an 
nnaeleeted group, but it does show the distribution anuBig 
people iiliaee mentality is below normal The nmnber of 
cases above normal then deereasea until at the extrone 
ri^t the enrve comes down to the base line AB, or nearly 
so, becatise there are few extronely bright or intelligent 
people. 

Near the middle of these two extremes along the line 
AB is a zone which we may call the zone of normality, the 
width of which may be represented by the line FP. The 
positions of the points F and Pare arbitrarily chosen ;heace 
the width of the zone of normality will varj- according to 
the jodgraent of the scientist. It is important to note that 
normality does not mean a point on the scale bat the dis- 
tance between two point* arbitrarily chosen. Reasonable 
latitude most be allowed for the pendolom of intelligence 
to swing a minor arc to the right or to the left and still 
remain within the zone of normality. 

The third problem mentioned above, therefore, is settled 
in an arbitrary way. The zone of normality will vary in 
width according to the scientist, and no sharp line of 
demarcation will separate the normal from the subnormal. 

Haberman defines a normal individual as "one whose 
reaction to given sthnuli is no more or less in degree and 
manner than a certain quantum that we have become accus- 
tomed to."" This definition apparently conforms to 
Binet'e conception of a normal individuaL 

^ The usual method of classifying people below the nonnal is to 
call thoBe belonging to the lowest stage of mentality, idiots, and, in 
ascending order imbeciles, feeble-minded, retarded, and oomal. 

" J. Victor Haberman, Tht Irdelligence Examinalion and Valuation, 
p. 2, 



I 

J 



r 
I 



I 



THE MEASUREMENT OF INTELLIGENCE 79 

Criteria for Separating the Normal from the Sab- 
□ormal. — There are various ways of separating the 
normal from the subnormal. They are sometimes desig- 
nated as: (1) the social-economic criterion; (2) the peda- 
gogical criterion; (3) the medical criterion; and (4) the 
psychological criterion.^' 

The definition of a feeble-minded person given by the 
Royal Commission of Great Britain, which investigated the 
qnestion of mental deficiency in. 1904, illustrates the social- 
economic criterion for measuring intelligence. A feeble- 
minded person was defined as: "one who is capable of 
earning a living under favorable circumstances, but is in- 
capable, from mental defect existing from birth, or from 
an early age, (a) of competing on equal terms with his 
normal fellows; or (b) of managing himself and his afEairs 
with ordinary prudence." 

The "jokers" in a definition of this kind are, of course, 
the expressions, "competing on equal terms" and "or- 
dinary prudence." These allow a great deal of freedom 
in defining a feeble-minded individual. 

The pedagogical criterion has been used for a long time 
as a basis for separating normal from subnormal children, 
the usual custom being to call children feeble-minded who 
are retarded pedagogically three years or more. This 
method is obviously defective because of the many things 
that might cause a child to be retarded three or more years 
and still have a normal mind or one almost normal. Sict- 
ness, a late start in school, bad eyesight, and poor economic 
conditions illustrate the point. 

The medical criterion is based on the assumption that 
mental deficiency is analogous to physical disease. Binet 
has tersely stated the chief objections to the medical 
criterion as follows : 




80 EDUCATIONAL MEASUREMENT 1 

Eaiih one aeeorijinf; to his own fancy fixes the boundary line 
Bcparating thueo states. It is in regard to the facts that tb» 
doctors dieagrue. In looking cloaely, one can see that the con- 
fusion comes principally from the fault in the method of examina- 
tion. When an alienist finds himself in the presence of a child 
of -inferior intelligeoce, he does not examine him by bringing 
out each of the symptoms which the child manifests, and by 
interpreting' all symptoms and classifying them ; he contents 
himself with taking a subjective impression as a whole of his 
subject, and of making bis diagnosis by instinct. We do not 
think we ore going to far in saying that at tiie present time ver;^ 
few physicians' would fae able to eite with absolute precision the 
objective and invariable sign or signs by which they distinguish 
the degrees of inferior mentality. 

The physician has been trained to diagnose and treat 
physical disorders primarily, and when confronted with a 
mental condition reasons very largely from analogy of what 
he knows of pliysieal states," 

It is, of course, the psj/ckologkal criteiion in which we 
are primarily interested. Each of the other attempts at 
classifying individuals has depended so much upon the 
ju(^^ent of those making the classification that it is lax 
leas definite than it should be for practical use. 

A knowledge of the distribation of intelligence among 
the people of a nation is of great importance both 
theoretically and practically. It is worth a great deal to 
know what percentage of the people may fall within the 
range of normality and to find oat what positions on the 
Be«l« for testing mental dei-elopment are symptoraatie of 
anuiil deficiency. It is also important, from a sociolo^cal 
standpoint, to know that adults testing below a certain 
mm are so low in inteUe^tnal development that it is a 
qotstion as to whether they have snfficioit equipment to 
snirire socially. Adults who test only ten years old men- 
tally, for instance, are as onferltuu group in intdleetoal 



rTHE MEASUREMENT OI" INTELLIGENCE 81 I 

ability with the probability that they will require more 1 

or less social care, while those who test only nine years I 

old are deficient enough to need continuous care. Goddard " 



found no ease at the Vineland School for the Feeble-Minded 
which tested higher than 12 mentally. Huey found but 
two such eases in the State Asylum at Lincoln, IlL, and 
Kuhlmann found only ten at the Minnesota State School 
for the Feeble-Minded." 

In the distribution of intelligence Terman found that 
about 60 per cent of aU. school children teat between 90 
and 110 I. Q. ("I. Q." means intelligence quotient and is 
the quotient found by multiplying the mental age by 100 
and dividing by the chronological age). About 40 per cent 
test between 95 and 105. An intelligence quotient of 110 
to 120 is five times as frequent among children of superior 
GDcial status as among those of inferior social standing; 
the proportion among the superior social group being 24 
per cent of all, whereas but 5 per cent of those that belong 
to the inferior social group test with an I. Q. of from 
110-120. Not more than three children out of 100 score 
as high as 125 I, Q. and only one in 100 as high as 130, 
while an I. Q. of 140 is made by only one in 250 to 300." 
Terman makes the following classification of intelligence 
quotients (from the Stanford Revision) :" 

ClassifieatioTt 

Above 140 "Near" genius or genius. 

120-140 Very superior intelligence. 

110-120 Superior intelligence. 

90-110 Normal, or average, intelligence. 

180- 90 Dullness, rarely classifiable as f eeble-mindefbess. 
70- 80 Borderline deficiency, sometimes classifiable 
dullness, oftQD as feeble-mindedness'. 
Below 70 Definite feeble-miadedneas. 



I 

I 

I 



'•C/. James B. Minor, Defideney ond Delinqvency, p. 95. 
"Termwi, op. cU., pp. 94-98. 
'Ibid., J,. 79. 



4 



82 EDUCATIONAL MEASUREMENT 

Feeble-minded individuals with intelligence quotients be- 
tween 50 and 70 include most of the morons ; those between 
20 and 50, imbeciles ; and those below 20, idiots. 

Are Differences in Intelligence of Degree or of Kind? — 
It makes a great deal of difference to those attempting to 
measure intelligence whether the differences found among 
individuals as to their intelligence is a difference in degree 
or a difference in the kind of mental traits and character- 
istics individuals have. 

If the difference is one of kind then we might expect the 
normal individual to have certain mental traits, powers, 
and eharacteristics not possessed by the subnormal in- 
dividuaL On the other hand, if it is simply a difference 
in degree, then the idiot possesses some of all the mental 
traits, powers, and characteristics that the normal in- 
dividual or even the genius has. And such would now 
seem to be the consensus of opinion, for a search for a 
qualitative difference between feeble-minded and normal 
individuals has failed to disclose any characteristic or men- 
tal trait found in the normal individnal and not posessed 
by the feeble-minded to at least a small degree. 

In his eariy work even Binct considered the difference 
between normal individuals and subnormal ones to be one 
of kind. In his work entitled MentaUy Defective CiSdrtn 
he says: 

A second and totally di^rent theOTy is tenable, and this one 
appears to as to be much aeaicr the tmfh. It is that a defwtne 
child does not resemble in any way a nonaal one whose develop- 
mest has bees retarded or airested. He is inferior not in df^ner 
but in hand. ... An nneqnal and impcafeet deveJopment is his 
^teeisl eharaeteristie. These inequalities of dereh^ment may vaiy 
to an; degne in different snbjects. They always produce a want 
of eqnilibrinin, and this want is the differentiatii^ attnbote of 
the defective child. 

In Binet's later works be came to the conclnaoa that 
the differwee between the normal and sobntainal 



1 



THE MEASUREMENT OF INTELLIGENCE 83 

one of kind but one of degree. In his volume entitled The 
Intelligence of the Feeble-Minded he writes: "We may 
thus pass in review all our faculties, and determine that 
not one is entirely lacking in them. . . . They always have 
them in some degree. . . . The arsenal of their intellect is 
equipped with all the weapons." 

Dr. Norsworthy pnblisbed the results of her experiments 
in measuring feeble-minded children in 1906. She tested 
a variety of their mental traits, and also the same traits 
of a number of normal school children. In no case did 
she find the normal child having mental traits not possessed 
by the feeble-minded child. The experiments of Pearson 
and Jaedorholm coincide with the findings of Norsworthy 
and also the later conclusions of Binet. The conclusions 
of these scientists are further confirmed by statistics. The 
Bubnormal children occupy the lower end of the normal 
distribution curve and the greater the degree of subnor- 
mality the fewer the cases found. If they were in a claas 
by themselves, the number of eases would he less near the 
limits of the class and greatest near the middle. But, as 
was stated above, the number grows progressively greater 
from the lowest mentality to the normal individual 

Choosiiig: Testa to Measure Intelligence. — The selection 
of tests that will measure general inteUigenee is an ex- 
tremely difficult problem because of the complex and subtle 
character of the mind to be measured. Many psychological 
principles must be kept in mind. Just any kind of tests 
will by no means satisfy the conditions. We indicated 
earlier in the chapter that the type of test chosen depended 
on what was conceived to be the nature of intelligence. In 
the earlier testing work, for instance, Binet believed that 
the ^sence of intelligence was the capacity to adjust the 
attention- He therefore devised tests such as the cancella- 
tion of letters in a specially prepared sheet, so as to bring 
I into play this faculty. He thought that the ability to dia- 



M EDUCATIONAL MEASUREMENT 

criminate between two near-lying points of the compasa on 
the skin was a matter of attention rather than sensation 
and used this as a test in the earlier part of his work. We 
shall enumerate some of the problems to be solved in the 
eelection of tests and criticize briefly their attempted solu- 
tions. 

Tests Must Not Be Influenced by External and Chance 
Conditions, — If the tests were alfected to any great 
degree by school training or environmental conditions they 
would not be a measure of native endowment. Sleasnring 
general intelligence is not merely measuring what a child 
knows or has retained; it is not a matter of knowledge 
but of something essentially dynamic behind and veritably 
in this knowledge. It is interfnnctioning, harmonious and 
correlated in the normal, but discordant and disrelated in 
the psychopathic, the intellectually defective. 

Ayres^' criticizes the Binet tests on the ground that five 
of them depend upon the child's recent environmental ex- 
perience. It may be, however, that his criticisms are not 
just because the degree to which environment is able to 
affect a child may reveal liis native endowment. Environ- 
mental effects are determined and distinguished in various 
ways from innate capacity," One method is the effect of 
practice on performance. Otlier things being equal, the 
less practice required to make a given unit of gain the 
greater the initial endowment. Great superiority over the 
average in intellectual effort indicates a high degree of 
native endowment. Also the appearance of definite activi- 
ties, such as when a child spontaneously interests himself 
in music or literature, indicates that the child possesses 
strong innate tendencies in those directions, Wlien prac- 
tice in varying amounts has already influenced initial 

" The BinetSinum Measuring Scale of InieUigence; Some Critieitnu 
and Suggestions. 

" Cf. Meumann, Vorlesungen, Vol. II., pp. 306-^14. 



1 



J 



THE MK\SUREMENT OF INTELLKJKiNCK 8A 

capacities, the cause of the differences in iiulivlduHlH ran 
be inferred from the effects of eqa&lisinK i>rtn*tipe.'* 
Further practice of equal amounts will rcduco (lilTui-encwi 
where these are not duo tu innate i.'atlKC«; if, howevrr, the 
inequalities increase with further pi-aetioe, Ihcy arc nirt 
caused hy irregularities in prpvioua praiiiicc iiinl iiri' tlii-rii- 
fore due to innate condilioMs.*" 

Only Those Testa Must Be Cbosen That Afford a 
Decided and Reliably Symptomatic Valuo, Ooneiml 
Applicability and Possibility of Objective Evolution.— 
As a matter of fact, a scale for flic iiirn.Muri-riicril i>( Intel- 
ligence is more limited in Ncope tlian tiie ahove ih'Ncrlplion 
would suggest, since it omits a great many cupacltliw or 
abilities that are not suppuMcd to \m indicative of lh« men- 
tality of an individual. For example, Ihrre arn ttwtM for 
the ability to discriminate two pointH on the xkin and for 
the ability to diBcrimiuate different whadeN of color; but we 
do not include these tetitii in our scalew ot int«Ui)i<!ncc, be- 
eaose it is not bdieved at the preMcnt tim« thai. iiui!h taibi 
kare diagnottle raloe for dixtinffoiiliiiig between (liAenmt 
grades (rf iauSii^enet. 

TMs Mmt Vofc Depand Toe Mwh m Um AMUj 
to Uw LaagHWb— JnM how na«fa the tbUUy 1« uw 
l i «g— y i> imfieatin ot iM^B§taee fai a qacadon. H*v« 
v« A vaai tett whca the ahSUtr to 9tm tt 6tpmi» not 
■iwlj — «fce ttmprAtmttm «f lasffWfl* tmt alw wpon 
f to fa— tW aidfrtr tafMy rwiWMif H 
I to* aOa^Mkr. k MCf fc* 





^%&K.. 



86 EDUCATIONAL MEASUREMENT 

This language difficulty inherent in the Binet Scale and 
in all the revisions of it became very pronounced aa soon 
as the use of the scale spread to workers in the various 
fields. The clinical paychologista in the large cities were 
face to face with the problem of the foreign child, the 
speech-defective, and the deaf children. It was obvious 
that the Binet Scale was not adequate for the mental 
examination of such cases. Other tests not involving 
language were introduced, and these gave rise to the type 
now generally known as performance tests, which will be 
described later. 

The essential characteristic of this type of test is that 
it shall not require any kind of language response on the 
part of the child for an adequate performance of the test. 
It is true that seven of the Binet tests depend on a child's 
ability to read and write, and many others require him 
to express himself through language. The degree to which 
this scale is vitiated because of its dependence on language 
has not been determined. It has been determined, however, 
that there is a high correlation between the ability to use 
language and general intelligence. 

Determining the Age to Which a Test Should Be 
Assigned. — One of the difficult problems in devising s 
mental scale has been the determination of the particular 
age to which the parts should be assigned. Suppose, for 
instance, we take one of the questions Binet assigned to the 
five-year-old group. The examiner repeats a sentence with 
ten syllables in it and asks the examinee to repeat it after 
him. Suppose 15 per cent of the children three years old 
can do it successfully, 40 per cent of the four-year-old 
children, 75 per cent of the five-year-olds and 100 per cent 
of the six-year-olds. To what grade should this sentence 
be assigned? Shall we assign it to the six-year-old group 
because this was the lowest group that was able to make 
a score of 100 per cent? Or, if 75 per cent of them are 



I 

J 



I 



THE MEASUREMENT OF INTELLIGENCE 87 

able to do it, is that sufficient T Or, if any percentage less 
than 100 is used, what percentage should it be before itSU 
assigned for that particular age? 

In the standardization of individual tests the usual cus- 
tom has been to consider a test properly placed if it i» 
passed by 75 per cent of the children. The idea is that 
if the test is properly placed it will be passed by 50 per 
cent of the children who may be considered of a\ 
intelligence, pins 25 per cent of the children who are coii< 
sidered above the average.*' Of course, theoretically at 
least, 25 per cent who are assumed to be below normal will 
be unable to pass the test. 

Binet's guiding principles in the arrangement of tests 
were: (1) Find an arrangement of the tests which would 
cause the average child of any given age to test "at age"; 
that is the average six-year-old must show a mental age 
of six years, the average eight-year-old a mental age of 
eight, and so on. (2) In order to obtain this result he 
found that it was necessary to locate an individual test 
in that year where it was passed by about two-thirds to 
three-fourths of the unselected children. 

Tennan'* considers the proper assembling of the testa 
one of Binet's biggest problems and one in which he failed 
in many instances, since many of the tests were misplaced 
as much as one year and one was misplaced six years. 

There are many objections to this method of locating 
tests ; but on the whole it seema to be about as good as we 
are able to get at the present time. 

Problems in Scoring. — There are certain problems in- 
cident to scoring the tests which deserve special treatment. 
We shall state briefly what these problems are and note 
Binet's solution of them. 

S, D. PorteUH, Condensed (hade to the Bind Testa (published by 
the Training School, Vineland, N. J., April, 1920), Part I, p. 10 . 
"Tennan, op, cU., p. 48. 




88 EDUCATIONAL MEASUREMENT 

ITie AU-or-None Method in Scoring. — The question here 
is as to whether or not credit should be given for a test 
done only in part. For instance: A child ia asked to 
count backwards from 20 to 0. Suppose he makes three 
errors but counts the rest correctly. Should any credit 
be given for such work? Binet said no. He held that 
a child must complete each test before any credit is given. 
This method of scoring has been severely criticized by 
many writers, notably Yerkes, Bridges, and Hardwick." 
The merits of the all-or-none method of scoring will be 
discussed more fully under A Paint Scale for Measuring 
Menial Ability in the next chapter. 

Shall a Child Be Required to Pass All the Tests at 
Each ^ge-Levelf — As was indicated above, each age-level 
had from five to seven tesls. The question arises as to 
whether or not it is necessary for a child to pass all the 
tests at each age-level before he shall be allowed to try 
those of a higher age-level, Binet realized that children 
had lapses in attention and that all the processes of the 
mind did not develop with the same rapidity. Therefore, 
in order to take cognizance of these characteristics, he ar- 
bitrarily said that if a child passes all the tests in any 
age-level save one, he shall be considered to belong to that 
age-level. For instance, at the five-year level there are 
five tests. A child passing four of them would be con- 
sidered five years old mentally. But suppose a child should 

iss all the tests in the five-year level, and two in the ' 
Bix-year level, how would we reckon his age? Binet solved 
this problem as f oUows : He took as a basis the highest age 
at which the child made a perfect score; that is, the age 
at which he passed all the tests or all save one. Then for 
every five tests he passed beyond this age, one more year 
was added to his mental age. 

r MeaeMring Menial Ability (Warwick & York, 



J 



THE MEASUREMENT OF INTELLIGENCE 89 

It sometimes happens that a child will pass all the tests 
in the five-year group, for instance, fail in two or three 
of them in the six-year group, and then pass all in the 
seven-year group. In a case of this kind the problem is, 
What age-level shall be taken as the basis for computing 
mental age? Yerkes and Bridges especially call attention 
to this point. 

With What Tests SJiall the Examination Begin? — 
Snppose one were called upon to test a hoy eight years 
old, where on the scale should he begin? Should he begin 
with the easiest tests, that is, the tests for the three-year- 
olds, and go as high on the scale as the boy can go, or 
should one begin with some other age-level, as the eight- 
year-old level, and determine the boy's mental age by 
going up if the boy ean pass tests above that age, or down 
if he is unable to pass the tests at this age-level? Binet 
solved this problem by the trial-and-error method. Taking 
one case with another, his best guess would he that the 
child is normal or nearly so. Therefore he would start 
enough below the normal age to find an age group he wai 
pretty sure the child could pass and proceed with the tests 
until the child was unable to pass any tests of a higher 
age-level. 

In making these tests it was assumed that a certain men- 
tal level normally goes with a certain chronological age, 
BO that the relation of mental age to chronological age 
indicates the amount of discrepancy between the amount of 
intelligence present and that required for normality. The 
custom has been to compute, in a simple way, the difference 
between the two ages, which, when negative, gave the ab- 
solute mental retardation and, when positive, the mental 
acceleration. This method is open to criticism, however. 
^L It has been shown that the increments from age to age 
^B are not the same and that a child chronologically ten years 
^^ old and retarded two years does not have the same retarda- 

M ■ 



I 



90 EDUCATIONAL MEASUREMENT 

tion as a child chronologically eight years old and retarded 
two yeara. 

The "At Age" and the Normal Child. — Care must be 
taken to distinguish between the "at age" child and the 
norvial child. The former is one whose mental and 
chronological ages are the same. But a child may be con- 
sidered normal it he is mentally a few months younger or 
older than the "at age" child. In other words, the "at 
age" children simply constitute the middle section of the 
zone of normality although children advanced or retarded 
a few months are still within the accepted bounds of 
normality. 

Einet Teats Give More Than a Composite Picture, — 
While it is the purpose of the Binet Scale to give a com- 
posite picture of a child's mentality, it nevertheless gives 
valuable insight into some of the detailed aspects of hie 
mental functioning. Meumann points out that there are 
three distinct types of tests : (o) tests of capacity or endow- 
ment, (i) tests of maturity or development, and (c) teats 
of environment or training. Bearing in mind this analysis 
of the scale the examiner may learn not only that the child 
is either normal, subnormal, or superaormal, but he may 
also learn in the case of subnormality, for instance, whether 
the defect is due to training or to native capacity. The 
giving of a conventional list of facts displays the quality 
of training, for instance, whereas the repetition of auditory 
digits or sentences displays, inferentially, at least, native 
capacity. 
■" Coefficient of Mental Age and the Intelligence Quotient. 
— Binet 's method of expressing the relation between a 
child's mental and chronological ages was simply to say 
that the child was six years old chronologically, for in- 
stance, and six-and-a-half years old mentally, or six yean 
old chronologically and seven-and-a-half years old men- 
tally, as the ease might be. Stem suggested a new method 



1 



J 



THE MEASUREMENT OF INTELLIGENCE 91 

of evalQating the results of the scale. His plan was to 
divide the number of tests a chUd actually passed by the 
number he ought to pass and call the quotient the coeffi- 
cient of mental age. By this scheme a child six years old 
chronologically, for instance, should pass all the tests up 
to and including those of the sixth-year group, or 20 in 
alL If he passed but 15, hLs coefBcient of mental age 
would be 0.75. If, however, he should be seven years old 
mentally, his coefGcient of mental age would be 1.17, etc. 

Dr. Terman uses the intelligence quotient, sometimes 
poken of as the I. Q., for the purpose of expressing the 
relation between the chronological and mental ages of chil- 
dren. It is found by dividing the mental age by the 
chronological age just as Stem suggested in determining 
the coefficient of mental age ; but instead of using fractions 
to represent this relation, Terman multiplied tlie quotient 
by 100 to avoid them. Thus the "at age" child by the 
Terman method of recording age is given an I. Q, of 100, 
and any child is considered normal whose I. Q. is between 
90 and 110. 

Limitations of the Tests. — The Binet Scale does not 
pretend to measure the entire mentality of the subject. 
The emotions, for instance, are measured only in the re- 
motest way, if at all. It is not claimed that the scale 
reveals special talent, and for this reason will not serve as 
a detailed chart for the vocational guidance of children. 
/( will, however, roughly iound the limits within which 
one's intelligence will permit success. Sharp lines of de- 
' marcation are not to be expected between the various de- 
grees of mentality. 

The Age of Mental Maturity. — Like other body tissues, 

the nervous system, which is the physiological basis of 

mental life, does not grow indefinitely. Since psychologists 

cannot directly measure the course of growth of the nervous 

L ^fstem in living children, th^ measure it indirectly by 



A 



L 



92 EDUCATIONAL MEASUREMENT 

measnring the behavior of the child in taking mental tests. 
Scientists are not agreed, from tests thus far made, as to 
/just when "native intelligence," or mental growth, has 
reached maturity. Praetieally all are agreed simply that 
maturity comes somewhere in the 'teens and that beyond 
this there is little or no development of this native capacity. 
Terman says :" 

Native intelligence, in so far as it can be measured by teats 
3w available, appears to improve but little after the age of 
5 or 16 years. It follows that in caleuiating the I. Q. {intelli- 
gence quotient) of an adult subject it will be necessary to dis- 
regard the years be has lived b«yond the point where intelligence 
attains its final development. Although the location of this point 
is not exactly known, it will be sufficiently accurate for our pnr- 
pose to assume its location at 16 years. Accordingly, any person 
over 16 years of age, however old, is for purposes of calculating 
I. Q. considered to be jnst 16 years old. If a youth of 18 and 
a man of 60 years both have a mental age of 12 years the I. Q> 
in each case is 12-M6 or 0.75. 

Porteus claims that the recent work of the army ex- 
aminers has shown that the age of maturity is much below 
the level set by Terman and that if the Terman standards 
are used, it is possible to diagnose an adolescent as being 
at the feeble-minded level when as a matter of fact he is 
comparatively little below the average of the ordinary 
population. 

Yerkes and Bridges say in connection with this question 
that "it seems highly probable that the adult level is at- 
tained as early as the sixteenth year." "' 

Spearman and Kuhlmann think that at the age of 15 
the native intelligence is mature. The former says the 
fact that mental ability reaches its full development about 
the period of puberty is still further evidenced by 



J 



r 
I 



THE MEASUREMENT OF INTELLIGENCE 93 

phygiology. For the haman brain has been shown to attain 
its maximDm weight between the ages of 10 and 15 years." 
On the other hand. Wallin thinks we should have more 
evidence before choosing a fixed age for the maturity of 
native intelligence." 

CriticiamB of the Binet Testa. — The criticisms of the 
Binet tests are many and varied. Some are favorable and 
some are unfavorable. We shall note first some of the 
unfavorable criticisms. One of the most radical and severe 
oritieisms of the Binet Scale has been made by Habennan, 
who, speaking of the Binet tests, says:'^ "Like the 
Frendian delusion, this Binet dilcttanteism has taken us 
by storm and shows splendidly our native gullibility and 
likewise an interesting phase of the hysterical lay-medical 
activity of the times." 

One of the early critics of the Binet Scale was Dr. 
Leonard P. Ayres. He criticized the tests from the stand- 
point of their content. His criticisms fall under six heads : 
(1) The tests depend predominantly on the child's ability 
to use words fluently, and only to a limited extent to 
perform acts. (2) Five of the tests depend upon the 
child's recent environmental experience; hence it is ques- 
tionable whether they test native endowment. (3) Seven 
of them depend upon his ability to read and write, which 
again raises the question as to whether they test native 
ability or school achievement. (4) Too great weight is 
given to tests of ability to repeat words and numbers. 
(5) Too great weight is given to "puzzle tests." (6) 
Unreasonable emphasis is given to tests of ability to de- 
fine abstract terms. Since Ayres made his criticisms in 

"C. Spearman, "The Heredity of Ability," fiupenica Reiieir, 1914, 
pp. 229-237, 

"J. E. Wallace Wallin, "Re-Avermenta Reapecting Psychoclinical. 
Norms and Scales of Developmeat," Piiychologic^ Clinic, 1913, 
pp. 89-07. 

■ Habennan, op. eii., p. 14. 



M EDUCATIONAL MEASUREMENT 

1911 much has been learned about the Binet tests, and 
we uow know that many of the criticisms made by him 
are not eo serious as they looked at that stage of their 
development. 

r The main defects of the Binet Scale according to 
Meumann are: (1) The single tests are not rightly graded 
according to their difficulty, (2) The tests of each kind 
are not suflciently numerous, and those of a particular 
kind are not repeated year after year bo as to trace the 
child's development in the various faculties of the mind. 
The different age-groups deal with entirely different men- 
tal functions; for instance, one of the tests in the five-year 
group is to repeat a sentence of ten syllables. This is 
a test in auditory memory. The next time the child is 
called upon to show his ability in auditory memory U 
when he is asked in the eight-year group to repeat five 
digits. The interval is three years and cannot, therefore, 
be a very accurate measure of a child's rote memory 
development. (3) The tests determine quite different 
capabilities; that is, they are not systematically arranged 
according to definite points of view. (4) The exact sum 
of the whole testing has not been decided upon. 

The scale is further criticized on the ground that there 
are two tests in some age-groups which depend on memory. 
This gives undue emphasis to one particular trait to the 
exclusion of others, which may be as much of a measure 
of mentality as memory. 
Limits of Traits Not Determined by the Binet Scale. — 

' The Binet Scale fails to determine the liuiits of a trait 
for two reasons: (1) The unit of accomplishment is so 
large that small degrees of progress are not recorded. (2) 
In one age-group auditory memory, for instance, may be 
tested whereas in the next group a different kind of memory 
is tested. If the child passes the test in auditory memory 
he is not tested again in that particular trait until three 



J 



THE MEASUREMENT OF INTELLIGENCE 95 

or four age-groups are passed. Hence his ability in that 
particular trait is not determined by the test taken. 

Myers^® points out that the determination of endowment 
and the measure of general intelligence has been ap- 
proached in a more strictly psychological manner by tests 
different from the Binet tests. In his opinion, the Binet 
tests are tests of production rather than psychological tests. 
He says: ''They determine how much an individual can 
work, how much he knows — ^not how he works, how he 
knows. A man's productivity, of course, is what we want 
to ascertain in everyday life. We do not care how a man 
comes to use or acquire his powers ; we are content with a 
mere dynamometric or other record of his prowess. But 
this aspect cannot properly be called the psychological 
aspect.'^ 

The Binet Scale Criticized on Other Points. — The scale 
is further criticized because there is not the same number 
of tests at each age-level. The four-year-old group con- 
tains but four tests whereas the other groups contain 
five ; and it may be easier to pass three tests out of four 
than four out of five. 

The absolute amount of retardation, that is, the differ- 
ence between the mental and chronological ages, is not a 
comparable quantity when different age levels are in 
question. Or again, two children with the same mental 
and chronological ages may have a very wide variation 
in the degrees of maturation of their native capacities. 
For instance, a boy ten years old mentally and chronolog- 
ically may have passed all the tests consecutively up to 
and including the ten-year group but be unable to go 
beyond that age-level. Another boy, mentally and 
chronologically the same age as reckoned by the Binet 
Scale, may have been unable to pass all the tests in each 

» British Medical Journal, No. 2613, pp. 196-197. 



96 EDUCATIONAL MEASUREMENT 

age group beyond the eightli year, but because he was 
able to pass ten tests above the eight-year level he was 
given a mental age of ten the same as the other boy. The 
range of the distribution of passes and failures is, as a 
rule, very much wider in the feeble-minded than in the 
normal individual. That is, a subnormal child may be 
very proficient in one trait and deficient in another. Some 
investigators have found it to be twice that of the normal 
individual, 

A mind with the least variation from the normal seems 
to be of a higher type than one which varies a great deal ; 
for example, a child who passes all the tests in the eight- 
year group and one in the nine-year group and can go no 
further would be considered of a higher type than a child 
who passes all the tests in the seven-year group, two in 
the eighth, and one in each of the next three groups, mak- 
ing him eight years old mentally. 

Many criticize the Binet Scale because it does not dis- 
play more completely the character of the mind being 
tested. Seashore, Pyle, and others demand a more funda- 
mental analysis and a more exact determination of mental 
ability than is possible to obtain by the Binet tests. Thus 
Seashore writes :'" 

Retardation does not follow a. cotnnon flat level any more than 
growth does, nor even nearly so much, A child develops one 
capacity several times as fast, and often at the expense of another 
faculty. This differentiation is even more striking in retardation. 
What is more, those who employ the teats for practical purposes 
should not be satisfied with a flat mental age. ... In a study 
of the normal individual wo seek to discover fortes and his 
faults, in short, to discover his particular deviation from the 
norm of the common level. There is no reason why the Binet- 
Simon tests should not develop into specific measures of the 
relative rank, or age, of more specific capacities and powers, 
such as reasoning ability, seasory observation, memory, imagina- 

" Journal of EducaHorud Psyctuilogy, Vol. 3, p. 50, 



J 



r 



THE MEASUEEMENT OF INTELLIGENCE 97-' 



tion, initiative, emotional life, self-control, etc. A cliild may be 
at the mental age of six in one capacity and twelve in another, 
and the important thing to know about the individual 
difference and direction of unsymmetrical development. It may 
be that a general llat-age test mnat be retained for certain 
purposes, but even that must be interpreted in the light of 
measures of specific capacities. Only by estenaion in recognition 
of this principle can any set of tests be of permanent value. 

Pyle criticizes the tests on the ground that there is 
no eominoE plan running through the tests for the suc- 
cessive years. He argues for "a series of tests for deter- 
mining the degree of development of logical memory, 
pote memory, attention, imagination, association, and two 
aspects of mind more complex, learning capacity and rea- 
ng. ... It is more important, it seems to me, to know 
specifically the condition of the child with reference to 
the development of the separate mental traits than to 
know his average performance with respect to them 
aU."" 

Porteus" has pointed out some of the limitations of the < 
Binet tests as follows: They do not constitute a perfect 
instrument of research and have failed to fulfill many 
expectations founded on them. They cannot be relied 
Upon for accurate diagnosis of the highest grades of 
feeble-mindedness ; he points out that there are other fac- 
tors besides the degree of intellectual development which 
have a bearing on social competency. Many intellectually 
inferior persons are judged normal by social criteria be- 
ie they possess practical abCitiea not evaluated by 
Binet. On the other hand, there are intellectually normal 
individuals who are socially unfit because of instability, 
weakness of temperament, volition, and other traits not 
Tevealed by the Binet examination. 



be 1 

S 

ly 



...... No, 19, Training 

School at Vineland, N. J,, pp. 1-3. 



iJ 



96 EDUCATIONAL MEASUHEMENT 

He further eriticizes the tests on the gronnd that ibej 
are too literary. Fifty-seven questions oat of 74 reqmre 
au oral lan^^uage response. Language development, either 
from the standpoint of comprehension, range of vocaba- 
lary, description, or defining power, is the main capacity 
tested in 50 per cent of the cases. Thirty per cent of the 
tests depend on previous educational training. Forty- 
eight per cent of the tests depend on mediate and im- 
mediate memory. 

His g<;ncral criticisms are that the tests are too literary ; 
that they favor the glib-tongued, quick-thinking child 
'who has had good educational advantages, who memo- 
rizes easily and therefore shows good scholastic promise. 
Thin is why the Binet tests correlate so highly with school 
training. It has been mainly through comparison with 
teachers' judgments that the validity of the testa has 
been established. There are cases, however, in which 
children with high Binet records show little power of 
adaptation to school conditions and also cases in which 
the class dullard "makes good." The main point that 
Portvus is pointing out is that the Binet tests will not 
measure accurately all individuals. On the other hand, 
it is not necessary to stretch out one series of tests so 
thin as to cover the whole iiold any\vay. The limitations 
of the tests pointed out by Porteus does not mean that 
the Binet tests are not to be used in diagnosis but that 
they must not be used to the exclusion of all others for 
diagnostic work. 

These defects were not unknown to Binet. In regard 
to mental age he writes thus:" "It has no bearing on 
the cause of retardation, nor upon its peculiar nature, 
nor upon the means of rectifying it." In regard to the 
fallacy of regarding retardation as merely equivalent to 
a lower mental age, he says :'* 
" MentaUsf Defective CfttMren, Engtish trana., p. 12. •• Ibid., p. 13. 



1 



I 

J 



THE MEASUREMENT OF INTELLIGENCE 99 

A defective child does not reaemble in any way a normal one 
whose development has been retarded or arrested. He ia inferior 
not in degree but in kind. The retardation of bis devplopment 
ittB not been uniform. Obstructed in one direction, his develop- 
ment has progressed in others. To some extent be has cnltivated 
TObstitutes for what is lacking. Consequently, snch a child is 
not strictly comparable to a normal child younger than himself. 

Some Favorable Critidsnis of the Binet Scale. — In spite 

of its defects the Binet Scale has been applied in prac- 
tically every; civilized country and has received general 
approval. Its ahortcomings are in its details and not in '\ 
the fundamental principles. The opinions of those best 
qualified to sit in judgment on it are to the effect that 
it has demonstrated its value and that it is destined 
through its revisions and corrections to be a most valuable 
instrument in measuring of mentality. 
Kuhlmann writes thus:^* 

There can be no question about the fact that the Binet-Simon 
theses do not make half so frequent or half as great errors in the 
mental ages [of feeble-minded eliildren] as are included in grad- 
ings based on careful, prolonged general observation by 
perienced observers. 

Meumann writes as follows :'" 

All the different authors who have made these researches [with 
Binet's method] are in a general way unanimous in recognizing 
that the principle of the scale is extremely fortunate, and all 
believe that it offers the basis of a most usefcl method for the 
examination of intelligence. 

Stern says:" 

That, despite the differences in race and language, despite the 
divergences in school organization and in methods of instruction, 



:• 



100 EDUCATIONAL MEASUREMENT 

there should be so decided agreement in the reactions of children 
— is, in mj opinion, the best vindication of the principle of the 
testa that one could imagine, because this Hgreement demonstrates 
that the tests do actually reach and discover the general develop- 
mental conditions of intelUgenee (so far as these are operative 
in pubhc-school children of the present cultural epoch), and not 
msre fragments of knowledge and attainments aeqnired by chance. 

Goddard says:'* 

It is withont doubt the most satisfactory and accurate method 
of determining a child's intelligence that we have, and so far 
superior to everything else which has been proposed that as yet 
there is* nothing else to be considered. 

Goddard not only defends the Binet Scale as a whole 
but defends the age grouping on the ground that it con- 
forms to the normal distribution curve. He says that if 
the questions were not properly grouped, age for age, but 
were too hard or too easy, the largest group would not be 
one "at age" but would be a year below or a year above 
according as they were too hard or too easy. 

Other Problems That Confront Those Testing Intelli- 
gence. — A number of other interesting and perplexing 
problems confront those attempting to measure the 
growth of general educational capacity. 

I. Does the defective progress normally to a certain point 
and then suffer arrest, or has mental growth been retarded 
from birth? — The study of subnormal children shows 
clearly that they are inferior and below par from the be- 
ginning. They suffer no arrest in their development any 
more than a normal child does. They simply develop 
more slowly and hence never reach the stage normal 
children reach when they are mature. This applies not 



1 



"H. H. Goddard, "The Binet Measuring Scale of Intelligence. 
What It Is and How It la to Be Used," Training School Bullelin, 
Vinelaffld, N. J., 1912. 



J 



THE MEASUHEMENT OF INTELLIGENCE 101 

only to their mental developfnent but also to their physical \ 
development. They are slower in learning to walk, talk, i 
creep, sit up, the appearance of their teeth, etc. ^-^ 

2. Does the defective child ha<)e the same mental equip- 
ment as the normal child of the same mental agef — Accord- 
ing to the best evidence that can he Mciired along this 
line, the defective child seems to have a mental equip- 
ment different from a normal child of the same mental 
age. For example, a child twelve years old chronolog- 
ically and seven years old mentally differs f rora-a- normal 
seven-year-old child in certain specific habits aUrl bits of 
information. The defective child has accumulated cfefttrin 
of these habits and bits of knowledge simply becaas^ 
he haa lived longer and has had more experience. His 
native endowment is more nearly matured than that of 
the normal child. But maturity in the defective child 
carries him only part way up the scale and it is in this 
eense that he is mentally equal to the normal child. The 
instincts differ, especially when a defective adolescent is 
compared with a normal child of the same mental age. 

3. Do feeble-minded children mature mentally at the 
same chronological age as normal children? — Measurements 
taken in training schools show that defective children 
continue to grow mentally until well advanced in 
adolescence. Their development is alow, and there is rela- 
tively more variation in the development of the traits 
than in normal children, but it may he said to be con- 
tinuous. 

4. Are subnormal children equally deficient in all abili- 
tiesf — Just as normal children develop one ability more 
than another, so subnormal children may be quite profi- 
cient in some intellectual traits and very deficient in 
others. Visual acuity and certain types of memory, for 
example, may be aa well developed in the subnormal child 
as in the normal and in some eases even better. ludeed, 



102 EDUCATIONAL MEASUREMENT 

it i8 this marked variation jofjiieiital abilities from the 
norm that 13 in part respoosihle for subnormality. Not 
only do the abilities, taken'' as a class, develop slowly but 
very irregularly. His sejeral intellectual abilities reach 
several levels on the intellectual scale. 

Perhaps enongft .has been said about the Binet Scale 
and the many problems that confront those attempting to 
uae it. We, have given a brief history of some of the 
more important problems and criticisms of the Binet 
Scale. ,:^ are agreed that it is far from a perfect measur- 
ing instrument. It has been open to many criticisms. Yet 
in spite of the criticisms, the scale, nevertheless, enables 
one to make a rough classification of pupils in a com- 
paratively short time and by rather simple means. Its 
value is perhaps practical and pedagogical rather than 
theoretical and psychologleaL There is no question about 
its being considerably in advance of the estimates of in- 
telligence based on school performance or on the biased 
judgments of teachers or parents. 

A aumraary criticism and an evaluation of attempts to 
measure intelligence will be given at the end of the next 
chapter. 

Bibliography 

1. Ayhes, Leonard P., The Binet-Simon Measuring Scale for 
Intelligence: Some Criticisms and Suggestions (Russell Sage 
Foundation, New York, 1911). 

2. Binet, Alfred, "L'intelligence des imbeciles," L'Annee 
Psgchologique, 1909. 

3. BuET, Cykii., "The Measurement of Intelligence by the 
Biaet Tests," Eugenics Review, 1914, pp. 6, 36-50, 140-152. 

4. GODDARD, H. H., "The Binet Measuring Scale of Intelli- 
gence; What It Is and How It Is to Be Used," Training School 
BuOetin, Vineland, N. J., 1912. 

5. HOLLiNGWOBTH, Leta S., The Faycholpgg of Subnormal 
Children (The Macmillan Co., 1920). 



1 



THE MEASUREMENT OF INTELLIGENCE 103 

6. Haberman, J. Victor, The Intelligence Examination and 
Evaluation (American Medical Association, Chicago, 1915), 
Pamphlet, 16 pages. 

7. KuHLMANN, F., "The Binet and Simon Tests of Intelligence 
in Grading Feeble-Minded Children," Journal of Psycho-As- 
thenics, 1912, Vol. 16, pp. 173-193. 

8. Minor, Jahes Burt, Deficiency and Delinquency (Warwick 
& York, 1918). 

9. PORTEUS, S. D., Condensed Guide to the Binet Tests, Part I 
(Training School, Vineland, New Jersey, 1920). 

10. Pyle, W. H., "A Suggestion for the Improvement and 
Extension of Mental Tests," Journal of Educational Psychology, 
Vol. 3, pp. 95-96. 

11. Rusk, Robert R., Experimental Education (Longmans, 
Green & Co., 1919). 

12. Spearman, C., "The Heredity of Ability," Eugenics Be- 
view, 1914, pp. 219-237. 

13. Seashore, C. E., "The Binet-Simon Tests," Journal of 
Educational Psychology, Vol. 3, p. 50. 

14. ScHWEGLER, Ray;;&£0nd a.. The Binet-Simon Scale of In- 
teUigence (University of Kansas, 1914). 

15. Stern, William, The Psychological Methods of Testing 
Intelligence, translated by Guy Montrose Whipple (Warwick & 
York, 1914). 

16. Terman, Lewis M., The Measurement of Intelligence 
(Houghton Mifflin Co., 1916). 

17. Thorndike, Edward L., Educational Psychology, Vol. 3 
(Teachers College, Columbia University, 1914). 

18. TiTCHENER, E. B., "Wilhelm Wundt," American Journal 
of Psychology, Vol. 32, April, 1921, pp. 161-178. 

19. Wallin, J. E. Wallace, "Re-Averments Respecting 
Psycho-clinical Norms and Scales of Development," Psychological 
Clinic, Vol. 7, 1913, pp. 89-96. 

20. Whipple, Guy Montrose, Manual of Mental and Physical 
Tests, Parts I and 11 (Warwick & York, 1915). 

21. Yerkes, Bridges, and Hardwick, A Point Scale for 
Measuring Mental Ability (Warwick & York, 1915). 




THE MEASDEEMENT OP INTELLIGENCB — Continued 




The discussion of the measurement of intelligence in 
this chapter will be divided into three parts: (1) the 
extension and revision of the Binet Scale and other 
measures of intelligence; (2) group intelligence scales; 
and (3) a summary and evaluation of measures of in- 
telligence. 

I, The Extension and Revision op the Binet Scale and 

Other Measures op Intelligence 
'Among the many revisions and extensions of the Binet 
Scale, perhaps the most important one in America is that 
made by Dr. Lewis M. Terman, known as the Stanford 
Revision. Other noted revisions and extensions are those 
made by Goddard, whose revision was first to appear in 
America, The Point Scale referred to above, by Yerkes, 
Bridges, and Hardwiek, and the translation and revision 
' by Kuhlmann. 

It is beyond the scope of an elementary work of this 
kind to give a detailed discussion of the various revisions 
and criticisms of the Binet Scale. It is the purpose rather 
to give some of the more salient features of the attempts 
at revisions and criticisms and to refer the reader to a 
more exhaustive treatment of these things elsewhere. 

The Stajiford Revision of the Binet Scale. — To render 
the Binet Scale applicable to American conditions, some 
reorganization was necessary, first, because conditions in 
America were different from those in France, and second, 



I 



THE MEASUREMENT OF INTELLIGENCE 105 

because it was felt that while the conatructioa of the 
Binet Scale was fundamentally right yet there were a 
number of details that needed revision. According to j 
Tc'rmao's view, many of the tests were misplaced. There 

I a dearth of tests at the higher mental levels; the ' 
procedure in giving the tests was not standardized; and 
many minor changes were necessary. 

The revision was a long process involving several years 
of tedious work in examining and reexamining ap- 
proximately 2,300 subjects, 1,700 of which were normal 
children, 200 defectives and superior children, and about 
400 adults. 

After Terraaa had compared the data from tests made 
with the Binet Scale in various parts of the world he 
decided to provide 40 additional tests to the Binet Scale 
to be used in the tryout for revision. This would make it 
possible to eliminate some of the least satisfactory ones 
and at the same time allow six tests for each age group 
instead of five. Care was taken to secure children whose 
ages did not vary more than two months from a birthday ; 
that is, the age of eight-year-old group, for instance, 
ranged from seven years ten months to eight years two 
months. All the children within two months of a birth- 
day were tested in order to avoid accidental selection. 
Tests of foreign-born children were eliminated in the final \ 
treatment of the data because they clearly did not repre-j 
sent a normal group. The children's responses to the 
teats were recorded almost verbatim. The revision of the 
scale below the 14-year level was based almost entirely 
on 1,000 unselected children. The object was to arrange^ 
the tests in such a way that the median mental age of ' 
the unselected children of each age-group would coincide 
with the median chronological age. For example, 
reet scale must cause the average child five years old 
chronologically to test five years old mentally, and the 



J 



106 EDUCATIONAL MEASUREMENT 

average six-year-old child to test six years mentally and 

so OIL 

I This relation was expressed in wbat is kno^vn aa the 

intelligeTice quotient discussed in the previous chapter. The 
scale above the 14-year age-level was based on the results 
of testing adults, since children in the grades older than 
14 would be classified as retarded. The validity of par- 
ticular parts of the scale was determined by dividing the 
children of each age-level into three groups according to 
intelligence quotients: (1) those testing below 90; (2) 
those testing between 90-109; and (3) those with an 
intelligence quotient of more than 110. The percentages 
of passes at or near that age-level was then ascertained 
separately for the three groups. K a test failed to show 
a decidedly higher proportion of passes in the superior 
group than in the inferior group, it was discarded ae un- 
satisfactory. 

The scale when finally completed consisted of 90 tests, 
36 more than were included in the Binet 1911 Scale. 
There are six tests for each age-level from 3 to 10, eight 
at 12, six at 14, six at "average adult" age, six at 
"superior adult" age, and 16 alternative tests. The al- 
ternative tests are to be given only when some of the 
regular tests have been rendered unfit bcause of coaching 
or for other reasons. 

A comparison of this scale with the 1911 edition of the 
Binet Scale above the adult group .shows that two tests 
are eliminated and 29 are relocated, of which 25 are 
moved downward and four upward. Eighteen of the 29 
relocated tests were moved down one year, four were 
moved down two years, two were moved down three years, 
one was moved down six years, three were moved up 
one year and six moved up two years. 

To Find the Mental Age of a Child By the Stanford 
Beviaon. — Since there are six tests for each age-group 



1 



THE MEASUREMENT OF INTELLIGENCE Iff? 

from in to S, each test passed counts two months to- 
wards mental age. For instance, if a child were to pass 
all the testa in the six-year group and two in the seventh, \ 
he would he considered six years and four months 'old ( 
mentally. Since there is no 11-year group in the Stanford 
Revision and there are eight tests in the 12-ycar group, 
these tests must cover a period of 24 months and each 
test passed should add three months to a child's mental 
age. 

The Picture Completion Tests. — A notable attempt to 
extend the Binet Scale is foiuid in the picture completion 
testa devised by Healy.^ His object was to test intelli- 
gence and at the same time eliminate the language factor. 

Since modifications of the Ebbinghaus Completion 
Method are now quite extensively used for language and 
fleem to correlate very highly with well-known tests of 
general intelligence, it seemed reasonable to Healy to sup- 
pose that practically the same sort of ability would be 
required to complete a picture as to complete a sentence. 
He therefore devised the Picture Completion Test. 

In giving these tests, the essential thing for the pupil 
is to see that something is lacking in the general situation 
in the picture. The child is asked to supply the missing 
parts to complete the scheme. In the Picture Completion 
Test, the choice of a missing part is limited to blocks 
flupplied to the subject, whereas in the language-com- 
pletion tests in general use, the subject has the entire 
range of his vocabulary from which to supply the missing 
■word. 

The material for the Pictiu'e Completion Test consists 
of a brightly colored picture 10 by 14 inches in dimen- 
3. It represents a barnyard scene in which ten simple 
activities are going on. There is no obvious connection 



108 EDUCATIONAL MEASUREMENT 

between the activities, but each is of such a nature as 
to appeal to the childish imagination. An object neces- 
sary for the completion of any one of the activities is 
omitted and it becomes the task of the examinee to find 
the most appropriate object. For example, two boys are 
playing with a football. One of the boys has just kicked 
it into the air and the other boy has his hands in a 
position for catching it, but the significant thing, the 
football, has been omitted and a blank space instead of the 
football appears between the two boys. 

In addition to the ten appropriate blocks for completing 
the pictures there are 40 others from which the subject 
may choose, ten of which are blank while the other 30 
bear pictures of objects. The picture board contains 10 
apertures each of which is one inch square, and the blocks 
are so cut that any block will fit any aperture. No indica- 
tion of the correct solution may be gained from the back- 
ground of the picture; but the subject must grasp the 
meaning in order to meet the requirements. 

A test of this kind offers some measure of a child's 
apperceptive ability and shows how well he is able to 
use his past experience in the solution of new and novel 
situations. This factor corresponds in the main to the 
definitions of intelligence given by Stern, Burt, Binet, and 
others. It is claimed that this teat differs from the or- 
dinary picture-puzzle tests inasmuch as it demands a 
choice on the part of the child. Healy claims the tests 
correlate well with the apparent mentality of both the 
delinquents and the normal individuals. The most of the 
work with this test has been done in corrective and pro- 
tective institutions. 

The Form Board for Measuring: Intelligence. — The form 
'' board, which in some respects is very much like Healy's 
I Picture Completion Test, wa.s originally devised by Seguin, 
'A false conception of psychology, the old faculty 



1 



J 



THE MEASUREMENT OF INTELLIGENCE 109 

pBycholo^, was responsible for its creation, but it has 
sinee proved to be of value in mental diagnosis. Owing 
to the fact that Seguiu believed defective children dif- 
ferent from normal children in kind of mentality, he 
thonght they wonid demand a different kind of test to 
measure their intelligence. He believed that general im- 
provement could be brought about by training in specific 
tasks; that the mind could be divided into separate en- 
tities such as attention, will, memory, imagination, and 
the like, and that training in each of these gave a general 
training of the whole mind. 

The form board, which consists of a board usually about 
14 by 20 inches in dimensions, has a number of irregular- 
shaped apertures in it with a number of blocks that will 
fit the various apertures. Each block will fit but one 
aperture and the test consists in determining how quickly 
pupils will assemble the blocks into their appropriate 
-apertures. Feeble-minded people have little sense of form 
and proportion, hence they must try many times before 
they can find the right blocks for the various apertures. 

A Scale of Ferformance Tests. — ^Pintner and Fatersoa', 
liavc designed and assembled a group of testa with the ^ 
idea of making a scale which will supplement the intelli- 
gence scales now in use. Their scale is the result of an 
attempt to measure the mentality of deaf children and) 
bad its beginning in 1914. Not only is it desirable to 
measure the mentality of deaf children, but there are^ 
many other types that cannot be measured by the or- 1 
dinary intelligence scales. The speech defective, the back- 
Vard child, the foreign child, and many types of sub- 
normal children do not respond to these tests in such a 
way as to reveal their mentality. A battery of per- 
formance tests, therefore, was thought by these men to 
be the sitie qua non to measure their general intelligence. 
The term "performance tests" is used here in a restricted 



110 EDUCATIONAL ME.\SUR]iLMEXT 

sense and indicates a group of tests which involves a great 
deal of manipulation with the hands and a min'Tnn in 
amount of language responses. 

The scale as reported by the authors consists of a group 
of fifteen tests as follows :* 

1. The Mare and Foal Pichire Board, a modlGeatioa of the 

original as designed by Heaiy. 

2. The Segnin Form Board, Twitmeyer's adaptation of the 

Goddard or the Goddaid Board itself. 

3. The Five-Figare Board, devised by Paterson. 

4. The Two-Figure Board, devised by Pintner. 

5. The Casaist Form Board, a copy of the original board 

devised by Enox. 

6. The Triangle Test, devised by Gwyn. 

7. The Diagonal Test, devised by Kempf. 

8. Healy Construction Puzzle A, devised by Healy. 

9. The Manikin Test, devised by Pintner. 

10. The Feature Profile Test, devised by Knox and Kempf. 
21. The Ship Test, devised by Glueck. 

12. The Picture Completion Test, devised by Healy. 

13. The Substitution Tests, devised by Woodworth. 

14. The Adaptation Board, devised by Goddard. 

15. The Cube Test, devised by Knos and modified by Pintner. 

A brief deseriptitm of the first test will give an idea 
of the general natxire of the series. 

The Mare and Foal Picture Board is a board measuring 
29 by 24% centimeters and one centimeter thick upon 
which a colored picture is pasted. The picture represents 
a mare and foal in a field with two sheep lying on the 
groTmd and three chickens in the foreground. In the 
background two houses are seen in the distance. Eleven 
pieces of irregular shape have been cut from the picture. 
Each piece represents certain pai-ts of the animals or of 



1 



' Rudolf Pintner and Donald G. Paterson, A Saiie of Perfo 
Teat {D. Appleton Co., New York, 1917), pp. 23-24. 



J 



r 



I 



THE MEASUREMENT OF INTELLIGENCE 111 

the scene. In giving the test the board is placed in front 
of the child with the pieces scattered over the top. The 
instructions are to put the pieces in place as quickly as 
poeaible without making any mistakes. The examiner 
watches the performance of each child and records the 
number of errors made. An error is an attempt to put a 
piece in the wrong place. The time is five minutes. 

Although the other 14 tests differ from this one, yet the 
general plan is the same. In selecting the tests for this 
scale, the aim was to obtain as many kinda as possible so 
that the various factors entering into the complex known 
as intelligence might be brought into play. Care was taken 
not to select the performance of a specific activity that was 
likely to have been learned by the child. The teats must 
be of such a nature that no verbal instructions are neces- 
sary. 

A Point Scale for Measuiing: Mental Ability. — There 
have been many differences of opinion as to whether tho 
*'all-or-none" method of scoring used in the Binet 
Scale was yielding the most satisfactory results. Many 
psychologists have been convinced that a more nearly; 
perfect picture of a child's mentality might be drawn 
if the scale were a little finer; that is, if the units of 
accomplishment were made smaller so that a child would 
get credit for any part of a test properly completed 
instead of having to complete the whole test in order 
to score. 

With this idea in mind, Dr. Robert M. Yerkes, assisted 
by James W. Bridges and Professor Rose S. Hardwick, 
undertook the task of revising the method of scoring used 
in the Binet Scale. Their intentions were at first simply 
to devise a better method rather than to attempt to modify 
the Binet Scale. A number of preliminary tests were 
given to approximately 1,000 school children and inraates 
a psychopathic hospital to determine the value of the 



J 



112 EDUCATIONAL MEASUREMENT 

single-series and the partial-credit ideas before attempt- 
ing to develop a more highly satisfactory form of point 
scale. The main defect they sought to correct in the 
Binet Scale may be illustrated from the following ex- 
ample taken from the Binet Scale. 

Suppose three pupils, A, B, and C, were asked to count 
backwards from 20 to 0. A makes the count without error, 
B makes two errors, and C fails entirely. In recording 
the results by the Binet method, A is given a perfect score, 
and B and C are recorded simply as failures and are pat 
in the same class. Now it is evident that B's score is much 
nearer like A's score than C's. The point scale would 
give A a perfect score, B a certain number of credits, and 
C a score of zero, which would be a more equitable rating 
of the three abilities. 

It is claimed also that the point system gives due credit 
to the more difficult reactions which are many times not 
properly weighted by the other method. By making the 
units of accomplishments smaller, it is possible to note gains 
made in case the test is repeated. In the case cited above, if 
the test were again given to C and he was able to count 
from 20 to with only four mistakes, his gain would be 
noted, but by the " all-or-none " method no progress would 
be recorded until he was able to make the count without 
error. It is claimed by the autliora that the point method 
of scoring tends to minimize the influence of the personal 
equation of the examiner and, in doubtful cases whore the 
examiner is not quite sure whether the examinee made a 
perfect score or not, the pupil may be more accurately 
graded. 

It is argued that the ideal examination question is one 
upon which the abler students will make a high score and 
on which the poorly prepared will be able to give some 
answer even though the answer is not of such a nature 
03 to merit a perfect score. The object is not simply to 



1 



I 



THE MEASUREMENT OF INTELLIGENCE 113 

know that certain individuals have a passing knowledge 
of the topic, while certain others have not, but to know 
how much the better candidates surpass the minimum re- 
quirement and to what degree the less able students fall 
short of it. A point scale opens the way for the classifica- 
tion of individuals in more nearly homogeneous groups 
thantheymay beclassified under Uie "all-or-none" method. 
The dominating idea of the whole scheme is to take cog- 
nizance of small gains; to make small units of accomplish- 
ment, rather than large ones, the units of measurement. 
The makers of the Point Scale also introduced the iTttelli- 
gence coefficient which is obtained by dividing the number 
of points obtained in an examination by the number of 
points obtained on the average. For example, if 40 points 
were the average for eight-year-old children and a child 
were to make 36 points, his intelligence coefficient would 
be 36 H- 40 = 0,9. A similar measure was suggested earlier 
but had not been put into practice prior to the time of 
the Point Scale. 

The testing material for the Terkes, Bridges, and HBrd-\ 
wick Point Scale was drawn very largely from the material | 
used by Binet. In fact, as has been said, the original 
intentions of the makers of the Point Scale were to develop 
a better method but not to attempt to modify the Binet 
Scale. But as the work progressed the makers were con- 
vinced that much of the materia! in the Binet Scale was 
inadequate, and changes were made where it was deemed 
necessary. The tests were selected with the intention of 
covering the various common forms of the principal mental 
functions. 

Tables I and II on the pages following show the distri- 
bution of the tests used among these principal mental func- 
tions.* 




X JkMmmj II !■ iry fir wnii («pkO. ■ 




SoggHtibOttjr, fitiul ptneptioo, eompuison. 
I. liOgieBl jndgmettt based on analysis and reafonin^.* 

Idtwlion involving vocabulai^, memory, analrsis. 

lj)^etA JDdgntent based on analysis and reasoning, at- 
tmtion, mnnoiy. 

Visual memoiy, perception, attention, motor coordina- 
tion. 

Ideation involving aaalyu^ imagination, ccomnand of 
tangnage formx. 



Home of the prineiplts for the selection of testing ma- 
terial were as follows: Other things being equal, prefer- 
ence wan (jiven to tests applicable through a considerable 
ranf;o of years, Much an memory span and free association ; 
and tbe different reactions to a given test which are char- 
actorifttin of nuceesHive stages of mental growth were dis- 



THE MEASUREMENT OF INTELLIGENCE 115 
Tablk II 



Mental 

Motor coordination 

Perception (visual) 

DiBcrimination (visual) 

Discriminatjon (kimeethetic) 

Association 

Suggestibility 

Memory 

Memory (auditory) , 

Memory (visual) 

Imagination 

Judgment (icsthetic) 

Judgment (practical) 

Judgment (lo^cal) 

Analysis and comparison . . . . 
Ideation 



criminated in the scoring wherever easily recognizable." 
For example, four gradations are recognized in the free 
association teat, two in the definition of concrete terms, four 
in counting backward, and so on. 

Aside from the points mentioned above the makers of 
the Point Scale state further reasons for the superiority 
of their scale over the Binet Scale. It is committed to no 
hypothesis as to the correlation existing between chronolog- 
ical and mental age, or between the different mental func- 
tions at different stages of development. It is capable of 
giving results of ever-increasing reliability and precision 
as data accumulate and norms are established; it works 
with a smaller amount of testing material, which makes 
it possible to select better material. The Point Scale con- 
sists of 20 tests which include about 65 questions, whereas 
the Binet pre-adolescent scale consisted of 52 tests with 
approximately 100 questions. 

Though the scale described above is for children, its 

*A Point Scale foT Meaeuring Menial Ability, p. 9. 



116 EDUCATIONAL MEASUREMENT 

makers present a list of prineiples for a oniversally ap- 
plicable scale for mental ability that will include adults 
as welL Among these principles are the following : 

4. Distribution of the several measurements in the series 
equally among the chief mental processes, as, for esample, aC' 
cordiog to the four following categories: 

(a) Receptivity, including such functions as sensibility, per- 
ceptivity, discrimination, and association. 

(h) Imagination, including memory, in its various aspects, 
and constructive imagination. 

(c) Afiectifity, including Bimple feeling, emotion, sentiment, 
volition, and suggestibility. 

(d) Thought, including ideation, judgment, and reasoning. 

5. Selection of twenty parts of the scale so that there shall be 
five for each of the groups of mental functions, classified under 
the headings receptivity, imagination, affeetivity, and thought. 

6. A minimam of 200 points, one-fourth of which shall belong 
to eacii of the above-mentioned groups of processes. 

The individual's score is to be in terms of the four 
mental processes instead of throwing the scores all together 
and a norm to be determined for all mental process. 

The two underlying principles of the Binet Scale are: 
first, the arrangement of the tests in groups corresponding 
to the years of chronological age and the consequent ex- 
pression of the results as "mental age," and second, the 
related "all-or-none" method of scoring. Many objections 
are raised against each of these principles. In reference 
to the first principle, which assumes that all normal in- 
dividuals develop mentally by similar stages, that the cor- 
relation between the different functions is the same for all 
individuals at a given stage, and that physical and mental 
development correspond, the critics of the scale claim that 
these statements are not warranted by the facts. On the 
contrary, it is pointed oat that such studies as those made 
by Decroly and Degand in Belgium show that on the aver- 
age the Belgian children are a year and a half in advance 



THE ME^ISUREMENT OF INTELLIGENCE 117 

of the group sdected by Binet mt Puis. Binet^s mttonpt 
to account for this discrepanqr ou the ground that the 
Belgian children bdong to a more imvileged dass is not 
accepted as scientific 

Binet's tests are criticized also on the ground that a 
range of six or seven years is included within what con* 
stitutes the ''normal age." Binet said that the children 
in one quarter of Pans were found to be advanced by 
four or even five years, and adds that ''one must, therefore 
no longer consider the retardation or advance of three 
years as an anomaly." This statement is criticized on the 
ground that this is a very large proportional variation for 
a scale that covers at most but twelve years. 

In scoring the examination, the examinee is credited with 
that age at which he passes all the tests, plus one year 
for every five tests passed from more advanced groups* 
The more difiScult tests designed for advanced ages do not 
count any more than those lower down when the final score 
is being reckoned. This seems to be an inequitable distribu- 
tion of credit because more credit should be given for testa 
passed in the more advanced age-groups than for tests 
passed in the lower age-groups. 

Proposed Reorganization of the Binet Scale by Meu- 
mann. — ^Meumajan^ has indicated the lines along which > 
the BinerScale should be reorganized and extended. His 
proposals for reorganization fall under three heads as 
follows : ' 

I. Endowment or Intelligence Tests Proper, — ^Under this 
heading he includes tests on: 

1. Concentration and fixation of attention. 

2. The immediate retention of verbal and non-verbal material. 

3. Imagination and thinking by means of descriptions of pic- 
tures. 



^ Vorlesungen, Vol II, pp. 28^288. 



118 EDUCATIONAL MEASUREMENT 

4. The combination test improved by Ebbinghaus. 

5. Employing and concentrating on visual images in special 
problems, as in the folded pa.per test, or such a test as : What 
kind of a, solid figure is produced when a right triangle is rotated 
around one of its sides making the right anglet 

6. Thinking by means of controlled association test. 

7. The concentration and synthesis through comparisons and 
differentiations' of different ditfienlt objects both, in the presence 
of the objects and in recollection. 

II. Tests of Dei'elopment in the Narrower Sense. — Here 
would be given: 

1. Vocabulary tests. 

2. Tests measuring the range of attention and the memory 

3. Tests determining the examinee's ability to reproduce words, 
as in the opposite test. 

4. Testa in temporal orientation, and ho on. 

III. Tests of Environment. — These teats may be divided 
into three divisions: 

1. Testing school knowledge, as knowledge of number, eoins, 
days of week, dates, names of months, writing from a copy, and 
writing from dictation. 

2. Knowledge acquired at home; meaning of words designating 
household articles, above and below, before and after, left and 
right, correctness in speech, testing errors in speech, number of 
the fingers, and so on. 

3. Spontaneous observation, by making inventories of mental 
content.* 

Wallin'a Criticisms. — Wallin criticizes the 1911 edition 
of the Binet Scale on the ground that the number of testa 
in each group should have been increased rather than 
diminished and a greater number of fimctions covered. 
In regard to the principle of an age-seale, Binet 's critics 
claim he surrendered its validity on many occasions, as, 



1 



f 

I 



I 



THE MEASUREMENT OF INTELLIGENCE 119 

for instance, when he accounts for Belgian children test- 
er than those of Paris because of their having 
a more favorable environment ; also when he takes excep- 
tion to Katherine Johnson's results because she treated 
single group children from different levels of 
privilege; and again when he tells of the wide range 
among normal children. 

Other Tests Devised to Measure InteUig«nce. — Many 
attempts have been made by still others to devise tests 
to measure intelligence. Some have attempted to improve 'i 
the Binet tests while others have devised entirely new ' 
ones. The following have attracted considerable atten- 
tion. 

Burt' has attempted to improve the method of record- ^ 
ing the mental development of children. Instead of using 
the mental age as a measure for backwardness and mental 
deficiency, he has used the standard deviation (for a [ 
definition of standard deviation see Chapter XI) of 
normal pupils, that is, the pupils who are in the usual 
class in school for their chronological age. Taking the 
standard deviation for the various grades he found it to 
be about one-tenth of the chronological age. That is, the 
standard deviation of a normal child 10 years old is one- 
tenth of his chronological age, or one year. A child five 
years old would have a standard deviation of a half year 
and one 15 years old would have a standard deviation 
of one and one-half years." He found that by taking the 
standard deviation as a measure a child who was retarded 
by more than three-tenths of his age needed a special 
school for his training. He says :" 

... for practical pnrposos, "bnckward" may be taken to 
denote children who, though not "defective," are yet unable, in 

• C. Burt, The Distribution and Relations of Educalional AbUUiei 
(KiniE and Son, London). 
■»/Hd., p. 31. "/Wd., p. 82. 



120 EDUCATIONAL MEASUREMENT 1 

the middle of their school career to do the work even of the class 
below their age; or more exactly, children who deviate helow the 
normal by at least one and a half times the "standard" deviation 
of individuals' of the same age ^roup ; and therefore are retarded 
by 15 to 30 per cent of their age. 

This method of recording mental age is of special im- 
portance because it has been proposed to use a similar 
method in measuring school achievements. 

Meiunann points out that in spite of the defects of the 
isolated test methods (discussed in first part of Chapter 
III ) , these tests should not be overlooked. They may be em- 
ployed aa preparatory tests in segregating students for 
further examination and they are valuable also as a means 
of determining quantitatively particular capacities. 

Rossolimo has an interesting and unique way of 
graphically representing one's mental make-up. His 
method assesses numerically eleven psychical processes, 
each of which has a value assigned to it varying from 
zero to ten. The eleven processes tested are attention, 

oiasisarseio J 


Attention 
Will 

Apprehension 

Memory ; 
Visual 
Auditory 
Nnmerical 

Comprehension 

Combination 

Mechanical sense 

Imagination 

Observation 


















1 1 


J 




















































































































































































































































. 


«, 


J 



THE MEASUREMENT OF INTELUGENCE 121 

will, apprehension, visual memory, memory for the ele- 
ments of speech, numerical memory, comprehension, com- 
bination, meebanical sense, imagination, and observation. 
All of these have several subdivisions. There are ten 
tests for each process or partial process and the score 
for each process is indicated by a mark on a table. The 
points are then joined up and the whole figure results 
in what is known as a "mental profile." " The score for 
each process is indicated by the mark on the table. 

De Sanctis devised a series of six testa primarily to\ 
measure not the level of intelligence, as do the Binet tests, \ 
bnt the degree of mental defection. 

A general idea of the de Sanctis tests may be gained 
from the description given below : 

1. Five balls each of a different color are placed before the 
Bnbject and the examiner says: Give me a hall. The time and 
manner of the response is noted. 

2. The same five balls are arranged in a row and the examiner 
saya: Which is the ball you just gave mef The time of the 
response is again noted. 

3. The subje<^t is Bhown a wooden, cube snch as is used in a 
kindergarten and the examiner says: Do yoa see this block of 
woodT After the subject has noted it, the esaminer continues: 
Pick out all the blocks like this on the table. On the table have 
been placed three cones, five cubes and two parallelepipeds. The 
time for selecting and arranging the cubes is noted. 

4. The examiner shows the subject a cube and says: Do you 
see this blockt After notit^ the block the examiner continues: 
Point out a figure an the form chart that looks like it. The form 
chart consists of ten rows of squares, triangles and reetangles 
with fourteen figures in each row, or 140 figures in all. If the 
subject points to a square, the examiner's next command is: 
Take this pencil {or pointer) and point out all the squares on 
the chart as fast as possible, without missing any, taking the 
figures line by line. The time, mistakes, and omissioM are noted. 



" "Mental Profiles: A Quantitative Method of Expressing Psycho- 
logical ProceBsea in Normal and Pathological Cases," Journal of 
Experimental Pedagogy, Vol. 1, pp 211-214. 



A 



122 EDUCATIONAL MEASUREMENT 

5. Additional blocks are placed on the table in Buch a 
that the distance between the cubes is not more than two centi- 
meters. Each cube shodd be just one-half centimeter longer on 
eauh side than the next smaller one. If it is desired to make 
the test more difflenlt, the numbeT of cubes may be increased or 
the difference in size decreased. The examiner then says: Here 
are some more blocks like those you have pointed out on the 
chart. Look at them carefully and tell me: (1) How many there 
are. (2) Which is the largest. (3) Which is the farthest away 
from you. The time, errors and omissions are noted. 

6. The examiner asks: Do large objects iceigh more or less 
than small objectsT Why does a small object sometimes weigh 
more than a large one f The second question is asked if the first 
one has been answered correctly. The subject is then asked: Do 
distant objects appear larger or smaller than near objects? Do 
they only seem smaller or are they really smaller? (The last 
question will show whether the subject is aware of optical illu- 

De Sanctis points out that if the sabject does not pass 
the second test the mental deficiency may be considered 
of a high grade. If the subject cannot go beyond the 
fourth test, or if he makes many mistakes, or is not at all 
certain in the fifth, the mental deficiency may be considered 
Blight. If the sixth test is completed without a mistake the 
subject may be said to present no mental deficiency. 

II. Gboup Intelligence Tests 

f One of the greatest drawbacks to the Binet -Scale from 
I an administrative standpoint is the fact that the tests must 
be given to but one individual at a time. This makes their 
administration a great burden if the number of persons 
to be given the tests is very great. In order to overcome 
this difficulty, group tests for mental ability have been 
introduced. The gain from an administrative standpoint, 
however, has not been an "unmitigated blessing," since 
much had to be sacrificed in order that the test might be 
given to groups of children. 



THE MEASUREMENT OF INTELLIGENCE 123 

On the other hand, the group intelligence tests were used 
in the American army with marked success. The experi- 
ence with the drafted men in the aimy proved that not 
only individual tests of intelligence, but group mental tests 
could be used to good advantage. The tests were thus 
introduced into the army, and psychological methods were 
used in the selection of its personnel. 

The American army was the only army that made use 
of intelligence tests in the selection of officers. As a result 
of the application of the group intelligence tests to hun-"- 
dreds of thousands of men, and individual tests to a very 
large number of the same men, correlations were made 
between the two test series, and between the single testa 
and various other criteria for judging intelligence. By 
these methods the group tests were found to be very re- 
liable. 

For the army Beta group mental tests, no educational 
training was necessary, not even the ability to read and 
write. The tests were given to illiterate foreigners who 
did not know the English language and also to deaf 
mutes. Some of the illiterate foreign recruits made scores 
higher than the average of the commissioned officers. 

For the army Alpha group intelligence tests, ability to 
read and write and to understand English is essential;, 
therefore, education through the third and fourth grades 
is probably necessary, but after this point, educational 
training has little effect upon the results. As a matter 
of fact, some with almost no educational training made 
among the highest scores. 

Piinoiples Involved in the Selection of Group Tests. — 
The first requirement in the selection of any test is that 
its beginning be easy enough and the directions clear 
enough that all the children may be able to do a part 
of the test. 

One of the purposes for devising and standardizing \ 



124 EDUCATIONAL MEASUREMENT 

group tests is to provide a scale for measuring the intelli- 
gence of large groups of children with sufScient accuracy 
to sort out all children of questionable mentality. An- 
other purpose is to obtain a group scale that will dis- 
criminate between dull, average, and supernormal chil- 
dren. When group testa are given one can be reasonably 
certain that children who make exceptionally low scores 
will, when tested individually by the Binet revisions, be 
found to be mentally deficient. This method saves much 
time. 

Requirements of Group Tests Are Many. — They must 
not only possess the eharaeteriatics necessary for the in- 
dividual tests but satisfy other demands. Simplicity is 
an important criterion for the selection of group tests; 
simplicity of material, of directions, of response, and scor- 
ing. Since a large group of children are to be tested 
at one time, it is necessary that the tests be selected in 
which the material used can be easily carried and quickly 
distributed. 

Simplicity of directions is one of the very essential 
eharad'eristics of group tests for the lower grades, espe- 
cially those below the third. The following experience 
recorded by Miss Frances Lowell illustrates the point in 
question. She says; 

Frequently one feels confident that the directiona for a certain 
test are perfectly clear and that they couJd not possibly be 
miannderstood, and yet when the test is given to a group of 
children he finds that the meaning is entirely lost. Good English 
must frequently be sacrificed, for the child in the piimary grades 
is surprisingly limited in vocabulary. Originally in giving the 
directions for the one of the six-year testa the writer said: 
"Make a cross in the largest square." The result was puzzling, 
for the children crossed the smallest square as often as' they did the 
largest. The test apparently was a failure for that age-group. 
However, hsfore discarding it the writer decided to experiment a 
little to see if the difficulty eould he discovered. Instead of having 



THE MEASUREMENT OF INTELLIGENCE 12S 



I 



I 



the children croaa the square they were asked to point to the larg- 
est square, whereupon one little girl tenrfully informed the writer 
that she didn't know what that meant. That solved the problem 
from then oa children were instmcited to make a cr 
biggest square. 

The Terman Oroup Tests of Mental Ability. — Br. Lewis 

M, Terman devised a scries of group tests for grades 7 . 
to 12 inclusive. Like the Binet tests they consist of a^. 
series of questions and problems selected from a large 
mass of possible test material. Each separate item was 
correlated with a dependable measure of mental ability. 
The preliminary trial series from which the tests were 
finally made was composed of a two-hour test of 13 ' 
parts with 886 different items. As a result of the prelimi- 
nary tryout, throe of the 13 parts were eliminated and 
the number of items in the remaining ones reduced to 
370. If an item failed to differentiate pupils of known 
brightness from pupils of known dullness, it was elimi- 
nated. Much attention was given to simplicity and con- 
venience in making up the tests. The tests are issued in 
two forms, A and B. Each contains 185 questions or 
problems. In any test of intelligence there is alway,'^ a 
margin of error. The author of this test suggests that 
both forms A and B be used and that the average result 
obtained from both forms be used as a possible basis for 
guidance of the individual pupil. Each form consists of 
10 groups of tests. Each group has a number of ques- 
tions or problems to be solved. 

Test 1. Information. — It consists of 20 parts. The pupil is 
told to draw a line nnder the one word that makes the sentence 
true. Example: Coffee is a kind of bark, berry, leaf, root. 
Time, two minutes. 

Test 2. Best Artswer. — It consists of 11 parts. The examinee 
is asked to put a cross before the beat answer to the question or 
statement made. Example: Spokes of a vAeel are often made 



If 



126 EDUCATIONAL MEASUREMENT 

of hickory hecause: (i) hickory is tough, {2) it cute easily, {3) 
it takes paint nicely. Time, two minutes. 

Test 3. Word Meaning. — It consists of 30 pairs' of vords 

arranged as follows: Expel — retain same — opposite. 

The instructions are: "If the words mean the same, the examinee 
ia to uiidcrseore the word same, if they mean the opposite, under- 
score the word opposite. Time, two minutes. 

Test 4. Logical Selection. — The instructions are to draw a 
line under two words that the thing always has; underline two 
and only tVK in each line. Example : A horse always has harness, 
hoofs, shoes, stable, taiL 

Test 5. Arithmetic. — This test consists of 12 problems of which 
the one given below is a sample : "How many hours will it take 
a person to go 66 miles at the rate of 6 miles an hour!" 

Test 6. Sentence Meaning. — It consists of 24 parts. Instruc- 
tions: "Draw a line under the right answer," Example: Does 
a conscientious person ever make a mietaket Yes, No. 

Test 7. Analogies. — Consists of 20 sentences in which the 
analogies are to be picked out. Example: Ear is to hear as 
eye is to table, see, hand, play. The examinee is required to 
underscore the proper word. Time, two minutes'. 

Test 8. Mixed Sentences. — Consists of 18 eentences with 
words not arranged in proper order. The instructions are to 
arrange the words ia correct order and then tell whether the 
sentence is true or false by underscoring one of these words. 

Example : (1) True bought cannot friendship be true, false. 

Time, three minutes. 

Test 9. Classification. — Consists of 18 groups of words, one 
word in each group of which does not belong to that class. The 
instructions are to cross out the word that does not belong there. 
Example: Automobile, bicycle, buggy, telegraph, tram. Time, 
three minutes. 

Test 10. dumber Series. — Consists of 12 rows" of numbere 
arranged in such a way that the examinee can predict what the 
next two mifising numbers should be by the arrangement of the 
numbers that are given. Example: 3 — 8 — 13 — 18 — 23 — 
SS — f — f 

Examination Form B is constructed on the same plan 
as Form A and is of approximately the same difdcnlty. 



THE MEASUREMENT OF INTELLIGENCE 127 

The scoring is done by ten scoring keys which, for 
convenience, are printed on a single sheet. 

Mental ability should be the fundamental basis for 
all grading promotion and elasaifieation of pupils. Ter- 
man emphasizes the fact that the purpose of these tests 
is "to make a difference in the educational treatment of 
pupUs, not to gratify a merely idle curiosity regarding 
(heir intellectual status." 

Accordingly, one of the chief functions of these tests 
16 in the classification of pupils into different groups for 
special types of instruction, as in the Oakland Plan, which 
divides the pupils into three groups — the bright, the aver- 
age, and the slow, for separate instruction according to 
their needs. 

The use of the tests will clear np many misunderstand- 
ings regarding individual children. They help to give 
reasons for the seeming misfits in school. They give edu- 
cational guidance that helps to simplify the problems of 
■vocational guidance. It is not claimed, of course, that 
the tests are infallible, but it js claimed that they make 
a very important point of departure for further study of 
the pupil. 

The National Intelligence Tests.— In March, 1919, the'\ 
General Education Board granted the National Research 1^ 
Cooneil the sum of $25,000 to be used in devising methods \ { 
for measuring the intelligence of school children. The \3S! 
Research Council caused to be organized a committee to +^ 
do this work. The committee was composed of Messrs. ' 
B. M. Yerkes, chairman, M. B. Haggerty, L. M. Terman, ^ 
E. L. Thomdifee, and G. M. Whipple. 

The committee decided to prepare a series of tests for 
I children from grades 3 to 8 inclusive. From a great mass 
of data, material was selected for 21 tests for a prelimi- 
nary trial. Children in a number of eastern cities, includ- 
I ing Washington, D. C, Cleveland, New York, Richmond, 






128 EDUCATIONAL MEASUREMENT 

and Alexandria, were given the preliminary tests. Dr. 
Truman Kellcy statistically analyzed the data obtained 
from this preliminary trial. Guided by Dr. Kelley's report 
and other information relative to the characteristic be- 
havior of the several tests, the committee selected 10 tests 
from the list of 21 for further use. 

These ten tests were arranged in two groups, or series, 
of five tests each. The testa selected are an adaptation 
for school purposes of the group intelligence tests nsed 
in the examination of recruits in the United States Army. 

Each scale is a complete unit for testing; but it is 
recommended that both scales be used in order that one 
may serve as a check on the other. A short preliminary 
exercise is given before each regular test to acquaint the 
pupil with the general nature of the work to be done. 
The testa in Scale A consist of; 

Test 1. Arithmetical Reasoning.—'Th.is test consists of 16 prob- 
lems growing progressively more difficult from the first to the 
sixteenth. The time allotted to this teat is five mirnites. The 
instructions given the pupils are to solve as many problems as 
they can in the time allotted. 

Test 2. Sentence Completion. — This test consists of 20 sen- 
tences, parts of which have been left out. The instructions are 
to fill the blanks with words to make the sentences sound sensible 
and right. Time, four minutes. 

Test 3. Logical Seasoning. — Consists of 24 parts. Part, or 
row, 1 is given here to indicate the general nature of the test: 

Elephant (circus, ears, hay, keeper, trunk) 

The instructions are: "In each row draw a line under each 
of the two words that tells what the thing always has." Time, 
three minutes. 

Test 4. Same-Opposite. — Consists of 40 pairs of words 
separated by a blank line, as new — old, stilt — noisy, fall — drop, 
etc. The instructions are to write "S" between them if they 
mean the same and "D" if they mean different things. 



J 



THE MEASUREMENT OF INTELLIGENCE 129 

Test 5. Symbol-Digit. — ConBista of nine irregular-shaped char- 
Bcters arranged in sin rows with 20 characters in a row. The 
nine digits, 1, 2, 3, etc., are placed in a line under the nine char- 
■Bters, each digit corresponding to a character. This is called 
the key. The instructions are to make under each drawing or 
character the number you find under the drawing in the key. 
Time, three minutes. 

Scale B eonaiafs of five tests a.s follows: 

Test 1. Computation. — This test is made up of 22 problems 
in arithmetic. The problems grow progressively more difQeult 
as the pupil works down the page. Time, four minutes. 

Test 2. Information. — It consists of a series of 40 sentencee, 
each sentence containing four words, only one of which makes 
the sentence trae. Sentence Numher 1 reads as follows; The 
day before Sunday is Friday, Monday, Saturday, Tuesday. The 
instructions are to draw a line under the one word that makes 
the sentence true. Time, four minutes. 

Test 3. Vocabulary. — Forty questions to be answered by "yea" 

' "no" are given. Time, three minutes. 

Test 4. J. noio^s.— This test consists of 32 parts, each part 
eonsisting of seven words. The first two words bear a definite 
relation to one another. The third word bears an analogous 
relation to one of the four succeeding words in the line. Part 
1 of this test is given here to Ulustrate the point; Finger — 

hand toe, box, foot, doll, coat. The instructions are: 

Bead the first three words in each line. Then read the last four 
words and draw a line under the right one. 

Test 5. Comparison. — The test consists of 50 figures, names, 
and drawings, arranged in pairs, wilh a dotted line between 

em. The instructions are: "If the two things in a pair are 

e same write 'S', if they are different write 'D.' " 

The Haggerty Intelligence Examinations, — Dr. M. E. t 

Haggerty devised a aeries of intelligenee tests known as 1 
Intelligence Examinations, Delta 1 and Delta 2. Delta 1 
consists of six gTou]is of tests designed for children of 
grades one to three. Delta 2 is composed of the same 
jmmber of tests and is designed for children in grades 



130 EDUCATIONAL MEASUREMENT 

three to nine. Each of these tests is preceded by a fore- 
exercjse' which gives the pupil an idea of what he is to 
do in the regular teat. 

The material in Delta 1 is of such a nature that a child 
unable to read may do several of the exercises. The first 
I'cgular test is one of oral directions. It consists of a page 
of pictures and depends on the oral directions of the 
examiner to do the test. For instance, the first picture 
is a mouse and the oral directions are to draw a ring 
around the mouse. 

Test 2. Copying Designs. 

Test 3. Pictwe Completion. — A part of the picture ia left out 
and the child is asked to supply the missing parts. 

Test i Picture Comparison. — Hero each line eoatains two 
pictures Beparated by a blank line. If the pictures are the same 
the pupi! writes "S" on the blank line; if they are different he 
writes "D." 

Test 5. Symbol-Digit. — Tie nature of the symbol-dij^t test 
was explained in the discussion of the National Inteliigenee Teats. 

Test 6. Word Comparison.— This is the first o£ the series that 
requires a reading abihty on the part of the examinee. This 
test is like the word comparison test described above. 

Delta 2 coneists of six tests as follows : 

Test 1. Sentence Meading. — It consists of a series of 40 ques- 
tions growing progressively more difficult that may be answered 
by "yes" or "no." Time, five minutes. 

Test 2. Arithmetical Probiems. — A list of 20 problems is givsn 
and the pupil is asked to solve as many as he can in five minutes. 

Test 3. Picture Completion. — Time, four minutes. 

Test 4. Synonym-Antonym. — Consisting of 40 pairs of words. 
This test is the same as the same-opposite test described above. 

Test 5. Practical Judgment.—lt consists of a series of 16 
sentences and questions, each of which has tiiree answers, or 
reasons for being, one of wlich is right. The child is asked to 
put a cross before the best answer to the statement or question. 



THE MEASUREMENT OF INTELLIGENCE 131 

Test 6, luforomtion. — ConsisiB of 40 sentences. It is similar 
to the one described under the Nationnl Intelligence Tests. 

The Otis Group Intelligence Scale. — The Otia Group \ 
Intelligence Scale is designed to test general mental ability. ' 
The scale is issued in two series, a Primary Examination 
and an Advanced Examination. The Primary Examina- 
tion is designed especially for the kindergarten, and for 
grades 1 to 4, and consists of eight tests which do not 
involve the ability to read. The Advanced Examination, 
consisting of ten tests ia designed for grades 5 to 12; in 
fact, for all literary persons, including university students. 

In each series, each test is independent of every othef 
test, but the tests taken together form a scale, which is 
printed in the form of an examination booklet. The ad- 
vanced examination consists of ten tests, each test con- 
sisting of a series of questions and problems. The tests 
are as follows: (1) following directions (2) opposites,^ 
(3) disarranged sentences, (4) proverbs, (5) arithmetic, 
(6) geometric figures, (7) analogies, (8) similarities, (9) 
narrative completion, and (10) memory. This advanced 
examination is suitable for testing all students from the 
fifth grade upward through tne high school and the uni- 
versity. 

The Otis tests are very popular and are praised very 
highly by others who bave designed group tests of intel- 
ligence. Tcrman writes as follows about the Otis Ad- 
vanced Examination :*' 

It is applicable to any individual, whether child or adult, who 
lias had the equivalent of three or four years of schooling. With 
subjecta of this amount of aehooling the Otia Scale probably 
conies ss near testing raw "brain" power as any system of tests 
yet devised. Indeed, it was the first scientifically grounded and 
satisfactory scale for testing subjects in groups. . . . No one 
else has done so mnch as Dr. Otis to free intelligence teats from 



132 EDUCATIONAL MEASUREMENT 

the influence of the personal eqnation of the esaminer. Perhaps 
it is too much to hope that any mental tests can be made "fool 
proof" but it is not too muoh to aay that the Otb Scale can be 
correctly given and correctly scored by any one who is intelligent 
enough to teach school. The plan of arranging the tests so that 
they may be scored by the use of stencils is a contribution of 
both practical and scientific importance. 

The Dearborn Group Intelligeace Tests. — Professor 
Dearborn has designed two series of tests, one for grades 
from 1 to 3, and the other for grades 4 to 9 inclusive. 
Series I consists of general examination 1, 2, and 3. Series 
II consists of general exminations 4 and 5. Though 
these testa differ from the other group tests described 
above, they are very similar to them in many respects and 
involve practically the same mental functions as the other 
teats. 

Uses of Intelligence Testa, — The uses of intelligence 
tests are many and varied. They are beginning to play 
an important part in determining vocational fitness. No 
one will claim, of course, that they will tell us unerringly 
for which of a thousand or more occupations an individual 
is best fitted. Sometime in the near future, however, 
industries will begin to set tlie minimum intelligence 
quotient for employees in their business. We now know 
that of the people who belong to the ranks of the industrial 
inefficient there is a very high percentage who are of sub- 
normal intelligenee. Of 150 "hoboes" that Mr. Knollin 
tested a few years ago, 15 per cent belonged to the moron 
grade of mental deficiency and almost as many were 
borderline cases." 

A bulletin published by the War Department on Army ^^ 
mental tests shows the intelligence level for the various 



" Cf. Lewis M, Terman, The Measurement of IntelliQenee, p. 54. 
" Army Mental Testa, Methode, Typical Resulia aJid PracHcal Applt- 
eations, November 22, 1S18, Washington, D. C. 



I 

J 



THE MEASUREMENT OF INTELLIGENCE 133 

occupations as determined by intelligence tests. The scores 
for some of the occupations are given below: 

45 to 49. — ^Farmer, laborer^ general miner, and teamster. 

50 to 54. — Stationary gas engine man, house hostler, horse- 

shoer, tailor, general boilermaker, and barber. 
55 to 59. — General carpenter, painter, heavy truck chauffeur, 

horse trainer, baker, cook, concrete or cement 
worker, mine drill runner, bricklayer, cobbler, 
caterer. 
60 to 64. — General machinist, lathe hand, general blacksmith, 

brakeman, locomotive fireman, auto chauffeur, 
telegraph and telephone lineman, butcher, bridge 
carpenter, railroad conductor, railroad shop 
mechanic, locomotive engineer. 
65 to 69. — ^Laundryman, plumber, auto repairman, general 

pipefitter, auto engine mechanic, auto assembler, 
general mechanic, tool and guage maker, stock 
checker, detective and policemen, toolroom expert, 
ship carpenter, gunsmith, marine engineman, 
hand-riveter, telephone operator. 
74. — Truckmaster, farrier, and veterinarian. 
79. — Receiving clerk, shipping clerk, stock keeper. 
84. — General electrician, telegrapher, band musician, 

concrete constructor foreman. 
89. — ^Photographer. 
94. — ^Railroad clerk. 
99. — General clerk, filing clerk. 
100 to 104.— Bookkeeper. 
105 to 109. — ^Mechanical engineer. 
110 to 114. — ^Mechanical draughtsman. 

115 to 119. — Stenographer, typist, accountant, civil engineer, 
*• Y. M. C. A. secretary, medical officer. 

125 and over. — Army chaplains, engineer officers. 

With the educational levels of the various occupations 
thus determined, those attempting to give vocational 
guidance may have the sanction of science in the judg- 
ments they make. Suppose, for example, a pupil aspired 
to be a mechanical engineer. The intelligence level for 



70 


to 


75 


to 


80 


to 


85 


to 


90 


to 


95 


to 



134 EDUCATIONAL MEASUREMENT 

mechanical engineers is from 105 to 109. Suppose further 
that the intelligence level of this particular pupil is only 82. 
Then the vocational counselor may say to him that he is 
aspiring to do work on an educational level several units 
beyond the level for which his general intelligence fits him 
and that, if he persists in his attempts to be a mechanical 
engineer, he will find it extremely difficult to succeed be- 
cause he does not have the potential power to succeed 
oa that leveL 

There are many things that the vocational counselor 
must bear in mind when attempting to measui'e intelligence 
and determine the vocational fitness of individuals for the 
various businesses and professions. It may be, as Thorn- 
dike says, that we have three intelligences instead of one 
general intelligence, namely abstract intelligence, social 
intelligence, and mechanical intelligence. The intelligence 
we now employ are primarily designed to measure 
abstract intelligence, the ability to do abstract reasoning 
arithmetic, logic, to deal successfully with abstract 
Eis, etc. An individual may make a high score with 
:ests of this kind and yet score low in matters of social 
intelligence. The following example will make the matter 
clear: College professors, teaching mathematics, are 
usually considei'ed to be good reasoners. Mathematics is 
supposed to bring into play the reasoning faculties of the 
mind. If one is a good reasoner in mathematics, will it 
follow that he will be able to reason successfully in a busi- 
ness situation which is pretty largely social? We know 
that there are many business men who would not score as 
high as the mathematicians on the intelligence tests now 
in use, but would be very much better in analyzing a busi- 
ness situation than the college mathematician. MeCaE 
points oat that there is a much higher correlation within 
any one of these intelligences thiui between any two of 
them. 



1 



THE MEASUREMENT OF INTELLIGENCE 135 

If these hypotheses are trae, then the vocational expert 
must be slow to assign an individual to a particular edu- 
cational level unless he has tested him with tests that 
reveal his level in that particular intelligence. 

The Binet tests found early and widespread use in 
juvenile courts, in state surveys for feeble-mindedness, and 
in many other fields of research. Each of the various 
fields has a literature of its own. It has been estimated 
that approximately 4,000,000 pupUs were tested in the 
public schools within thirty months after the appearance 
of the first group intelligence scale. Children of the inter- 
mediate and grammar grades are the ones most favored 
in this respect. Classifications are being made on a basis 
of the scores made on these tests. It is not claimed that 
all the classifications made are wise and will lead to better 
results. In the main, however, the results are gratifying 
and have been worth while. 

The workers in the field arc growing more cautions as 
the work advances and are warning those who are less 
familiar with the limitations of the tests to be very con- 
servative in their claims for them. 

Wherever intelligence tests have been given in the 
schools, they have shown that approximately 2 per cent 
of the school population will never develop beyond the 
11 to 12-year-old level. The tests are giving us much 
valuable information along many lines. Healy and others 
have pointed out that courts are in the habit of admin- 
istering punishment to juvenile offenders without know- 
ing very much about the mentality of those whom they , 
direct. A very high percentage of the social offenders 
are mentally deficient. Mental testing is beginning to shed 
much light on these problems and less injustice will 
undoubtedly be done in the future in passing judgment 
on our social offenders. 

The use and abuse of the tests have aroused widespread 



136 EDUCATIONAL MEASUREMENT 1 

criticisms, both constructive and destructive. In any event, 
the criticiams have given us a clearer understanding of 
what intelligence is and how to measure it. 

in. Summary and Evaluation op the Measurement op ] 
Intelligence 

In the foregoing pages we have noted some of the major 
problems that confronted those attempting to measure in- 
telligence. We have also pointed out some of the attempts 
to solve these problems and have noted some of the more 
important tests and scales now in use. We shall now at- 
tempt to summarize and evaluate the movement to measnre 
intelligence and bring our description of it up to date 
and also note the fields where special research work will 
probably be done in the immediate future. 

We noted in the first part of Chapter III that the scales, 
or groups of tests to measure intelligence, have arisen from 
the individual tests. The mental scales are merely a group- 
ing together of these individual tests in order to give a 
more general picture of the mental make-up of the indi- 
vidual. The first tests were concerned with the specific 
"faculties," capacities, or abilities of the mind. They 
came into existence when the old "faculty psychology" 
was still in good repute. It was then thought that the 
proper way to measure intelligence was to choose any 
faculty or trait of the mind for investigation and that the 
data obtained from the measurement of the trait chosen 
would be indicative of the general intelligence of the in- 
dividual. It was also thought that these faculties, or 
abilities, functioned somewhat independently of one an- 
other and that they were easily isolated for investigation. 

It was soon found, however, that the old "faculty 
psychology" idea was untenable and that, instead of a 
trait or capacity functioning more or less independent of 






THE MEASUREMENT OF INTELLIGENCE 137 

other capacities, it was very closely associated with them 
in its fanctioning and that the complete isolation of a 
mental trait for measurement was impossible. 

All are agreed that there is a native capacity that con- 
ditions all mental functioning. Just what this capacity is 
and how it manifests itself are unsettled- questions. It 
must not be inferred, however, that, because psychologists 
cannot define and completely measure this native endow- 
ment, we cannot turn the measurements thus far made to 
good account. Data obtained from measurements now made 
are not only of theoretical and scientific value but prac- 
tical value as well. 

We are gust entering one of the most fertile fields of 
research that science wiU ever explore. Everything of 
a constructive natnre that man does is conditioned and 
determined by this mental capacity we are attempting to 
measure. All philosophy, science, and art wait upon its 
growth and development. 

Methods Are Yet Cmde. — Just as the pioneers enter- 
ing a new country must of necessity use crude methods 
until explorations are made and the needs of the country 
determined, so the scientists entering a new field, with 
strange environment, and with tools and equipment de- 
signed for other fields, must work at a disadvantage until 
he has orientated himself and has his bearings in the new 
field. The psychologists are just beginning to get their 
bearings in this new field of intelligence measurements. 
They are beginning to see that many of the implements 
used for measurements in other fields are ill-adapted to 
this new enterprise. They are also learning that certain 
other tools are indispensable in the explorations and 
measurements of mental traits and capacities. Tests in- 
volving merely visual acuity, such as the cancellation of 
the A's on a printed page, for instance, apparently do 
not exercise enough of this native endowment to be of 



138 EDUCATIONAL MEASUREMENT 

much value in Judging the general intelligence of an indi- 
vidual. Hence they are being discarded as measures of 
intelligence. The same may be said of most tests func- 
tioning on a perceptual level. 

Fortunately, after a period of about twenty years of 
work in attempting to measure intelligence, we have a 
symposium by a group of thirteen leading American 
psyciiologists on what intelligence is, how it may best be 
measured, and what the next steps are in this field." 

Terman in attempting to answer these questions in the 
symposium has tersely stated the situation in regard to 
the value of a right conccjition of general intelligence. 
After endorsing the conception of Meumann that we should 
first find out "what is demanded of intelligence and then 
analyze the mental functions which meet that demand," 
he Bays: 

If we accept this view it is evident that the important intel- 
lectual differences among men will not be fonnd on the eenaory, 
perceptual, or purely reproductive level. It is well known that 
a moron may be able to see, bear, taste, or Emell, react to a 
signal, balance a bicycle, steer an automobile, or cancel A's about 
as well as an intellectual geoius. The latter wonld be some- 
what bis Eaperior in memory for non-sense syllables, would 
excel him more in logical memory, and would outclass him hope- 
lessly in the ability to distill meanings front the raw products 
of sensation and memory. The essential difference, therefore, 
is in the capacity to form concepts to relate in diverse ways, 
and to grasp their significance. An individuat is intelligent in 
pToportion as he is able to carry on abstract thinking. 

In answer to the retort that some may make that he 
is simply singling out a particular mental trait for special 
worship, that other traits are just as valuable as abstract 
thinking, and that it is a kind of intellectual snobbery, 



1 



J 



THE MEASUREMENT OF INTELLIGENCE 139 

that holds that general intelligence is manifested chiefly 
by one's ability to do abstract thinking, he says:'' 

Civilization with its science, art, government, religion, ph>\. 
losophy, aad systems of credit, is onthinlcable except aa a product 1 
of concept dsboration and symbolic thinking. ... It cannot be 
disputed . . . that in the long run it is the races which excel 
in abstract thinking that eat while others starve, survive epi- 
demics, master new continents, conquer time and space, and 
substitute religion for magic, science for taboos, and justia 
revenge. The races that eseel in conceptual thinking could, 
they wished, quickly exterminate or enslave all the races notably 
their inferiors in this respect. Any given society ia ruled, led, 
or at least molded by the five or ten per cent of its members 
whose behavior is governed by ideas. The typical pick-and- 
shovel man does his thinking cMefly on seosori-motor and per- 
ceptual levels. Add a Httle more ability to think on the rep- 
resentative level and he may be able to repair your automobile, 
build you a house according to an arctiitect's specification a, or 
nurse you in illness. Add a large measure of ability to associate 
abstract ideas into complex systems and he can design a new 
type of engine, draft the plans for a skyscraper, or discover a 
curative serum. 

In regard to the next step in research in the measure- 
ment of intelligence, the general consensaa of opinion of 
the men taking part in this symposium was that the 
immediate task is to refine the tests and catalogue the 
characteristics that are recognized as belonging to higher 
intelligence; that is, those elements that emphasize, more 
than any now existing, deliberation and sustained ra- 
tional ability. There must be a courageous attack upon 
the problem of measnrement of other than intellectual 
factors. The problem of special aptitudes must be 
attacked and the general technique of measurement 
improved. While this programme is a broad one, never- 
theless, the problems are being vigorously attacked and 
progress is being made. 

>' Ibid., pp. 127-128. 



140 EDUCATIONAL MEASUREMENT 

One of the fields that is receiving special attention at 
the present time is the measurement of the non-intellectual 
traits of individuals. The work of Dr. Downey on the 
"Will-Profile" suggests the possibility of supplementing 
our intelligence examinations by objective measures of the 
so-called character-traits. 

Dr. Haggerty iu summing up the work of mental 
measurements for the past j-ear, says: "If a single sum- 
mary phrase were useful to indicate the drift of current 
discussion, we might choose 'the inadequacy of intelligence' 
as a suitable title.'"* 

It is obvious to the careful experimenter that tests of 
the type now in use do not give all the information that 
we need to know about children. Either the definition 
of intelligence must be broadened to include other traits, 
or we must conclude that there are traits of a non-intel- 
leetual type that determine to a large degree the success 
or failure of an individual in life. The functioniag of 
the eye muscles may, for instance, determine the amount 
of reading an individual can do, and thus condition his 
entire life programme. 

Industry, which is apparently beyond the limits of what 
we call intelligence, has much to do with success. Hag- 
gerty ^^ directed an experiment a few years ago, the pur- 
pose of which was to study the characteristics of 50 men 
admittedly successful and 50 others who were obviously 
failures in life. 

When the data were combined for each of the successful 
men the result showed clearly that, in the combined 
opinions of all the judges, the quality most apparently 
conducive to success was industry, which was defined as 

"Recent Developments in MeflBuring Human Capacitiefl," Journal 
Educational Reeearch, Vol. 3, April, 1921. AddreBs delivered before 
e National AsMciation of Directors of Educational Research at 
Atlantic City, N. J., March 3, 1021. 
"iWd., p.245. 



1 



J 



THE MEASUREMENT OF INTELLIGENCE 141 

"thorough, persistent, painstaking, enduring" and th« 
opposite of "lazy, yluggish, indifferent, superficial." The\ 
nine traits ranking next in order were: "efficiency, atten- jj 
tivcness, loyalty, prudence, honesty, adaptability, ayTDr// 
pathy, tactfulness, and cheerfulness." ' . 

The most of these traits are beyond the limits of what > 
we ordinarily call general intelligence. Such non-intel- 
lectual traits as industry, loyalty, honesty, tactfulness, 
sympathy, and cheerfulness weigh heavily in favor of 
success, and such other non- intelligent traits as self- 
assertion, pride, conceit, jealousy, quarrelsomeness, i 
gestibility, and intolerance make their contribution in the 
direction of failure. Either our tests of intelligence are 
inadequate or intelligence itself is inadequate to produce 



We have many cases of ciildren in the public school 
with a high I.Q. who do very poor work in school, and 
also many cases with a normal I. Q., or even an I. Q. below 
normal, who make the best grades in the class. The latter 
are invariably industrious and persistent. Haggerty re- 
ports the case of a boy who stood third in a class of 60 
in a series of intelligence tests which included the follow- 
ing: opposites, analogies, hard directions, verb-object, 
Trabue completion, and the Thomdike Reading Scale, 
Alpha 2. He was later examined with the army examina- 
tion A, in which he scored 325 points placing him easily 
in the ur>per 10 per cent of high school freshmen and the 
equal of many college students. He scored 179 points on 
the Otis Scale; yet during his four years in high school 
only twice did he achieve a mark as high as C on a five- 
point scale of marks. 

In contrast with this student was a girl in the same class 
whose score on the army examination A was 231, who 
scored average on the initial tests (ranking 23 in a group 
of 60 entering pupils) and whose I.Q, (Tennan) was 108. 



EDUCATIONAL MEASUREMENT 

In her four years of high-school work, only five times did 
this girl make a grade as low as B in an academic subject. 
Twenty-eight times out of 33 her marks were A, and she 
was classed by her instructors as the beet student in the 
'lass. 

Examples like those cited by Haggerty can be found 
in all large school systems. 

Pressey notes that "if we wish to foretell Bueeess in 
liiool we must obtain a measure of school attitude." 

Terman insists that "mental tests should be supple- 

snted by ratings on character traits and by educational 

Its." 

-„^_ Haggerty points out that "it is not at all probable that 
a perfect measure of iateUigence would give a perfect 
correlation with school success or with success in later life. 
A more accurate measure of intelligence would only render 
the inadequacy of intelligence more apparent for the simple 
reason that success is not quantitatively coterminous with 
intelligence but with intelligence in combination with other 
significant human traits not subject to evaluation by tests 
of the type currently used as measures of intelligence,"** 

In the ' ' WiU-Profile" mentioned above, Dr. June 
Downey has attempted to design a scale that consists of 
12 objective tests designed to measure such personal qual- 
ities as assurance, flexibility, speed of movement, motor 
impulsion, resistance, tenacity, coordination of impulses, 
freedom from inertia, motor inhibitions, care for detail, 
speed of decision, etc. 

The author claims that these tests have considerable 
general eharaeterological significance and that they can be 
used to advantage in getting the general temperamental 
pattern of an individual and they may also determine 
specific combinations of traits, and, in conjunction with 

" Ibid., pp. 246-247. 



1 



J 



THE MEASUREMENT OF INTELLIGENCE 143 

mtelligence tests, afford in many situations a basis for con- 
Bcrvative prophecy. "^^^= 

Healy points out another phase of the measurement of 
.telligence that is worth noting. There is a type of indi- 
vidual he calls "verbalists" which is characterized by an 
ability to handle language above his ability along other 
lines." "On account of the ability of this type to handle 
language well, the members of this group are not properly 
placed by the ordinary tests of social intercourse. The 
common method of passing judgment on people is, of 
course, through conversation and also questions, and if one 
gets answers that follow properly, that are consequential 
and coherent, why then without more ado one infers the 
answerer to be practically normal." 

; is interestiag to note that the city superintendents 
and high-school principals of the Council of Administra- 
tion in the State of Kansas endorsed a plan on January 
20, 1921, whereby many of these non-intellectual traits and 
traits indicated by Dr. Downey's "Will-Profile" are to 
be taken into consideration in the awarding of grades in 
the high schools in the State of Kansas. 

The definition of Grade A as endorsed by that Council 
is given below: 

Grade of A 

1. Scholarship, Exceeding expectations of instructor, 

2. Initiative. Contribntions exceeding the assignment, 

3. Attitude. Positive benefit to class. 

4. Cooperation. Forwarding all groapa of activities. 
6. Individual improvement. Actual and noticeable. 

Educators are beginning to ask which is the more signifi- 
cant inquiry concerning a candidate for admission to 
college that has just completed a four-year high-school 



144 EDUa\TIONAL MEASUREMENT 

course: "Wliat subjects has he previously studied?" or 
"What IB his general intelligence or ability?" The sub- 
jects a high-school student has previously studied are 
significant primarily because they may give a measure of 
his general ability, rather than because they indicate a 
knowledge of any particular fact. The intelligence exami- 
nation can, therefore, in a general way, be valuable in con- 
nection with the selection of students for college admis- 
sion only as supplementing the high-school record and not 
as a substitute for it. 

The question naturally arises as to how the regular room 
teacher may utilize these tests for more efficient school 
work. "Wc can speak quite dogmatically and say that they 
are a great improvement over the personal judgment of 
the teachers. 

A few suggestions will be offered here relative to the 
use of these tests by the regular room teacher, 

1. The teachers should familiarize themselves witli some of 
the standardized intelligence teats now in use. 

2. Care should be taken to see what these teats are desired 
to teat. A familiarity of the various traits' of the mind gained 
through the careful study of the testa will quicken the teachers 
to detect paxticular traits very early in the child's career, 

3. If the children are given mental tests, the teacher leams 
the type of minds with which she has to work. 

4. Reasonable improvement can he determined only after the 
mental capacity of the individual child has been ascertained, 

, If progress in school achievements has been slow, the teacher 
has a good defense if the general intelligence is low. On the 
other hand, if the children have high intelligence quotients and 
progress has been slow, the burden of the proof that the school 
was well tauglit would lie on the teacher. Of course, there are 
other factors that enter into the problem that must be taken 
into consideration. 

5. If regular room teachers, not specially trained to give inteU 
Ugence tests, shovtd give such tests, the directions for giving the 
tests mitst be followed to the letter and conclusions and impliea- 
tions must he conservatively drawn. 



THE MEASUREMENT OF INTELLIGENCE 146 

6. Don't treat tests as "educational curiosities." Their purpose 
is to give information about the developing mind. They suggest 
educational policies. They indicate a more scientific selection 
of subject matter. 

7. If we are to have better tools for measuring intelligence, 
we must discover through the use of the tools we now have 
wherein they are deficient. 

8. The spiritual admonition of the Apostle Paul when he said, 
"Prove all things; hold fast to that which is good," was never 
more needed in the field of religion than in education. 

9. Every teacher should learn the principles involved in test 
making and in giving tests, not always for purposes of diagnosis 
of children but for the guidance of her own behavior towards 
them. 

10. Psychological tests will enable us to predict with a fair 
measure of scientific accuracy the extent to which pupils will 
avail themselves of the opportunities which are set before them. 
They do not constitute a psychological clinic; nevertheless, they 
are valuable in diagnosis. Their purpose is to make a mental 
analysis of the child in order to discover the assets and defects 
so that the examiner may prescribe the proper treatment. 

Bibliography 

1. BuBT, C, The Distribution and Belations of Educational 
Abilities (King and Son, London). 

2. Courtis, S. A., The Gary Public Schools; Measurement of 
Classroom Products ((General Education Board, New York, 1919). 

3. Haggerty, M. E., The Intelligence Examination (World 
Book Co.). 

4. Healt, W. 0., The Individual Delinquent (Little, Brown & 
Co., 1915). 

5. Haggerty, M. E., "Recent Developments in Measuring 
Human Capacities," Journal of Educational Research, Vol. 3, 
April, 1921. 

6. HOLLINGWORTH, Leta S., Vocational Psychology (D. Ap- 
pleton and Co., 1916). 

7. "Intelligence and its Measurements; A Symposium,'* 
Journal of Educational Psychology, Vol. 12, March and April, 
1921. 

8. JuDD, Charles H., Measuring the Work of the Public 



14fi 



EDUCATIONAL MEASCREMENT 



SehocU, ClereUnd E^ocational Surrev (BiHseD Sage ] 
tion. New York, 1916). 

9. LniK, H. C^ Employment Ptj/dtoloJH (The Maemfllan Co., 
1918). 

10. MSSBrrBSBeac, Hcgo, Pryekology tmd IndMstfidl Effidtiteif 
^Hoagbton lOSm Co., 1913). 

IL National tnUUigenee Tegtt, prepared by Haggerty, Tei^ 
man, Thonidilfe, Whipple, and YeAea (World Book Co.). 

12. National Soeietj for the Stndy of Edneation, tbe rariooa 
Tearbookg (Pnblie Sehool Publishing Co., Bloomiiigton, UL). 

L3. Otii Group InteUigeiux Scale, designed by Dr. Arthor 
8. OtU (World Book Co.). 

14. PiNTSEB, RuDOi-F, Asd AxDEBSOS, Uabcabit 11, Tite Pic- 
ture Completion Tett. 

15. PmTKEB, RcDOLF, and PATEBScns, Do!>'ALD, A Scale of Per- 
fomanee Teste (Warwick and York, 1917). 

16. RossoLiuo, "Mental Profiles; A QoantitatiYe Metliod of 
ExpresNDg Psyeholt^ial Processes in Normal and Pathological 
Caaes," Journal of Experimental Pedagogy, VoL 1. 

17. Rusk, Robkbt R., Experimental Education (Longmans, 
Green ft Co., 1919). 

18. Tebhak, Lewis U., The Meagnrement of InteUigeiue 
(Hoogfaton Mifflin Co., 1916). 

19. Terman Group Tests of Mental Ability, designed by Lewis 
M. Terman (World Book Co.). 

20. Yehkes. Bridges and H.oidwick, .4 Pom«( Scale for 
Meaguring Mental Ability (Warwick and York, 1915). 




CHAPTER V 

THE NEED FOB DEFINITE MEASUREMENTS OF SCHOOi. 
ACHIEVEMBNTS 

Time Consumed to Giviug ExamiDations.— It is esti- 
mated that on aa average each teacher gives as many as 
twenty examinations each year and that it takes approxi- 
mately three hours to give each examination and to grade 
the papers.^ If we estimate that there are 600,000 
teachers in the United States, this would mean that 
36,000,000 hours are spent in giving examinations. There- 
fore, if the time can be lessened and the aeciiraey of the 
measures increased, the effort is well worth while. 

That the examination system is here to stay is an 
obvious fact. The need for some method of determining 
the efficiency of the educational processes is so obvious 
that no argument is needed to substantiate it. The folly 
of putting children through educational processes day 
after day and never knowing what the results are can 
be endorsed neither as a principle nor as a matter of 
expediency. Schools cannot be eificiently operated with-) ^ 
out a system of examinations any more than a business! i' 
man can run his business without invoicing his stoct 
occasionally. The best known and, in fact, the only way 
to determine the value and status of an educational proe- 
iss is to take an invoice of the products. 

One cannot work long in the field of experimental edu- 
cation without coming to the following conclusions: (1) 
Examinations of some kind are necessary. (2) The exami- 

"A New Kind of School Examination," Jourrud of Educationtd 
Re^ettreh, Vol. 1, pp. 33-46. 

147 



i 



148 EDUCATIONAL MEASUREMENT 

nations must he of suck a nature (hat they may be given 
by the regular room teacher. Only rarely can they be 
administered by the superintendent in person or by some 
one acting for him. 

Though convinced theoretically, many administrators 
find it hard to see just how measurements can he effectively 
carried on to advantage in their schools. The accusation 
is made that examiners testing their schools carry away 
the results and, if they ever get hack to the schools, they 
are many times so intangible that they mean nothing as 
a factor in determining a school policy. 

We must build our ideal system of education syn- 
thetically, taking the best methods from each of the prev- 
alent groups of theories. But we cannot determine best 
methods until we measure our processes. People generally 
agree in measuring a product, if they can agree on the 
measuring stick. 

Measurement in any department of natural science is 
the comparing of a given magnitude with some convenient 
unit of the same kind, and the determination of how many 
times the unit is contained in the magnitude. The unit 
of measurement is conventional. Its choice is simply a 
matter of practical convenience. 

Unfortunately, until quite recently, there have been no 
objective units of measurements of general acceptance in 
the field of education, A good teacher, or a good student, 
or a good school was purely a matter of judgment with 
no commonly accepted measuring device to verify the 
judgment made. 

Attitude of Teachers and Pupils Toward Examinatioiis. 
— Generally speaking, most teachers and pupils dislike 
examinations. Since this feeling is so general and since 
ao much has been said derogatory of examination, the sub- 
ject challenges us to a careful study as to the causes of 
thJ3 state of affairs. Are examinations in themselves Just 



1 



NEED FOR DEFINITE MEASUREMENTS 149 

natopally repulsive and obnoxious to pupils and teachers? 
Do pupils dislike to have their educational achievements ^f 

measuredl Each of these questions must be answered in\ /y^* 
tlie negative. There is no part of the educational process \ \ 
that pupils like better than to have an unbiased statement J 
as to their school achievements. There is nothing funda- Jl 
mentally distasteful about an examination, either for thr i 
teacher or pupil. Yet the examinations as now given 4re 1 ■ 
not satisfactory. Since they are necessary and are not 1 
intrinsically and fundamentally repulsive, the cause of I 
their dislike must be referred to one or more of the follow- / 
ing: (1) the methods of devising the examination; (2) / 
the methods of administering the examination; or (3) the/ 
methods of scoring the papers. Improvements along one 
or all of these lines will help to remove the obnoxious 
features from examinations and make them a pleasure in- 
stead of a piece of drudgery that must be tolerated. 

School achievement tests, discussed in this chapter and 
Chapters VI and VII are nothing more than unproved 
examinations. It is confidently believed by the writer that 
improved methods of devising and administering examina- 
tions and scoring examination papers, such as we shall 
have eventually in the form of standard tests, will make 
contests in school achievements as interesting as athletic 
contests are at the present time. There is no doubt but 
that the case of scientific measurements has been argued 
and won. Of course, the fundamental thing in school 
achievement tests is not to provide mere entertainment but 
to devise a system of measurements that really measures. 
Nevertheless, if the general attitude of teachers and pupils 
regarding examinations can be changed so that they may 
be looked upon as a pleasure instead of drudgery as they 

K are considered now, attempts at improving them are worth 

H while. 

^B That both teachers and pupils need "professional 



J 



150 EDUCATIONAL MEASUREMENT 

burdles/' qaantitAtively stated goals, set before them, 
agaiufit whi<.-h tlieir developing effecti^'eneas may be ire- 
quently cliwdted up, is a recognized principle in edacation. 
It is one methud of keeping them up to optimum efficient^'. 

Tb« Harking System now in Vogue, — ii tbere is anj 
one pbaiie of wliool work in whicli both tbe teachers and 
the public apparently liave had an abiding faith, it has 
Iwen in the ability of teachers to exprais definitely and 
concisely in per cent, letters, or adjectives, the exact prog- 
ress a pupil has made in his school work. As erideaiee 
o( this faith we liave allowed these marks to determine the 
fituetts of candidates for college; their eligibility for 
atbleticA; scholarships and fellowships, which amount to 
hundreds of thousandu of dollars every year, have been 
granted with practically no other evidence of the candi- 
date's fitness; these marks have determined the fitness of 
candidqtee for civil service i)Osttionii, admission to Phi Beta 
Kappa, "cum lauda," "magna cum. lauda," and other 
special honors granted certain members of graduating 
classes who were successful in getting the requisite number 
of A's or H's or other high marks of distinction. 

Parents look forward with a great deal of anxiety to 
the time when Johnny will bring home, in the form of a 
monthly or quarterly leport card, a record of his achieve- 
ments. A fair "sprinkling" of A's and B's is sufficient 
evidence to convince the average father or mother that 
Johnny's progress ia satisfactory, so with a great deal of 
pride, but little real information, tbe report card is signed 
and i-eturncd to the teacher. If the grades were poor tbe 
Itarents might question the honesty of the teacher but 
never the reliability of the marking system. 

tirades A. B. 0. and D usually represent degrees of 
exoelleace between 100 per cent and 60 pet cent of "some- 
thing"; no one knows exactly what. A pupil's "passing 
mark" is osuallj' a grade equivalent to 75 per cent of 



J 



NEED FOR DEFINITE MEASUREMENTS 151 

"something," and if he gets an average grade of 95 per 
cent of this " eomething, " he may be valedictorian of his 
class. 

If school marks are so inefBcieut, the question im- 
mediately arises aa to whether some "inventive genius" 
or "wizard" has invented some system of scales and units 
by which school achievements may be measured with the 
exactness that the apothecary compounds his medicines, 
or the mechanic measures the diameter of a piston head. 

The answer to this question is perfectly definite. There 
are no scales or units yet invented (hat will measure school 
achievements with anything like the exactness that it is 
possible to get in the physical sciences. It should be noted, 
however, that the precision that the natural sciences enjoy 
was not developed in a day. The sciences have been evolved 
a step at a time from humble beginnings tp the high state 
of perfection they now enjoy. The Wright brothers did 
■not equip their first aeroplane with a Liberty motor. 
Robert Fulton's steamboat would he a sorry-looking spec- 
tacle beside the great floating palaces that now cross the 
Atlantic in three or four days; and Stevenson's wlieezing 
little locomotive would certainly be at. a disadvantage in 
competition with a "twentieth century limited." Each 
of these great inventions has bad the most humble begin- 
nings, and their evolution may be traced a step at a time 
until' the present state of eificicncy is reached. 

In the same way we shall have to refine our standards 
and units of educational measurements. There has been 
too much absolutism in education and too little of a reahsm 
that sees the good and bad in all and diminishes the bad 
and augments the good. If we adopt this view we become 
really empirical, living through each educational experi- 
'ment to incorporate it into a growing treasury of tested 
theory, not deducing success or failure from metaphysical 
or doctrinaire prejudice. 



f" 



ft 

\ me 



152 EDUCATIONAL MEASUREMENT 

It may not to too much to predict that twenty-five years 
from now our present scales and standards will be as much 
out of date as Pulton's steamboat. Sometliing like per- 
fection will have been reached when we are able to con- 
struct scales by scientific methods that can be applied as 
the foot rule is now applied, regardless of time, or place, 
or person. 

Scientific Measurement of School Achievements Is New. 
— It is only within the last dozen years that any real 
progress has been made in the scientific measurement of 
/^hool products. In 1908 Dr. C. W. Stone, a student under 
{ Thomdike, published his arithmetic test, and the following 
\ year Thomdike presented his scale for handwriting before 
\the American Association for the Advancement of Science, 
then meeting in Boston, These two tests represent the 
real beginnings of the scientific measurement of educational 
products. Thorndike has been called the father of the 
educational measurement movement in America, and Dr. 
J, M. Rice, mentioned in the introductory chapter, the 
grandfather. Inspired by the work of Rice, Stone, and 
Kiomdike, Courtis soon followed with an arithmetic test 
the four fundamentals which went through a number 
of revisions until it took the present form now known 
to us as The Courtis Standard Research Tests in Arith- 
metic, Series B. At the present time the number of 
standard educational tests that have been developed for 
the various subjects is probably somewhere between 100 
and 150. 

In spite of the fact that practically every educational 
meeting of note for the last ten years has devoted a great 
deal of the time to the subject of the scientific measure- 
ment of school achievements, and also the fact that educa- 
tional literature is replete with discussions of the new 
movement, yet at the present time most teachers accept 
the fact of educational measurements in a passive way, 



NEED FOR DEFINITE MEASUREMENTS 153 

and a large majority have neyer made use of these new 
educational tools. 

On the other hand, there is a minority group of teachers 
who have looked upon these tests and scales as instruments 
for refining their cruder processes, and they are making 
rapid strides in this direction. The fault, however, does 
not lie altogether with the teachers. The administrative 
machinery, both from the standpoint of the state and local 
district, is organized and operated on the old system. 
Grades are still recorded and promotions made on the old 
basis. An administrative change must be brought about 
before the new movement has the complete right of way. 

Those in the Profession Must Take the InitiatiTe for 
Improvement, — Professional improvement must come 
from within and not from pressure vrithout the profession. 
The medical profession has not advanced because the 
lawyer, or the minister, or the business man forced up 
their standards. They were forced up by the leaders in 
the profession. School management will never become more 
scientific than the teachers themselves make it. The more 
teachers that may be induced to accept improved methods, 
the sooner the teaching profession will reach a stage where 
teachers may be held morally, if not criminally, liable for 
malpractice just as they do in the medical profession. Be- 
fore this can be done teachers must be thoroughly con- 
vinced that the old methods are inadequate and that the 
new ones ofEer better opportunities so that they will not 
hesitate to adopt the new. Of course we cannot expect a 
complete transition over night. Changes have not been 
wrought in other sciences that way, and progress is pretty 
much the same in all sciences. The "arts of healing," 
for instance, of our ancestors, the product of ages of selec- 
tive effort, have given way to the modem science of 
medicine. However, some people still cling to the dogmas 
and cures of the older medicine as to cherished heirlooms. 



J 



154 EDUCATIONAL MEASUREMENT 

Since the transition from magic and faith cures, the med- 
ical profession has become so strongly entrenched by the 
accumulation of scientific facts that it yields to no social 
force. Even the military takes orders from these men of 
science. 

There are, of course, new fields being explored where it 
is stjU a matter of opinion as to what procedure is best; 
but along with these there is a procedure in some of the 
more familiar fields that is so definite that any departure 
from its makes the practitioner liable to a fine for mal- 
practice, A more profound insight into scientific methods 
and a greater sensitiveness of our shortcomings are the 
twin forces that are operative in convincing men that 
human incapacity, suffering, and waste can be reduced and 
life made better through more purposeful use of scientific 
knowledge as it becomes more accessible. 

Ftirposes of Educational Tests Are Not Generally 
TTDderstood. — The work in educational reform has been 
hindered a great deal by a lack of understanding as to 
what the purposes of tests are. Monroe cites the case of 
a school superintendent in a city of more than 50,000 
population stating that he did not believe tests had much 
merit. His reasons were that lie had given the Courtis 
Standard Research Tests in Arithmetic and the children 
did not do any better after the tests were given than 
they did before. Just as though standard tests were 
teaching devices that would improve their arithmetic 
ability. That is analogous to the case of a mother who, 
being anxious for her baby to gain in weight, said that 
she would not weigh her baby any more because weigh- 
ing it from time to time did not increase its weight. 

The Problem to Be Solved. — The first problem that con- 
fronts one attempting to measure school achievements is 
that of finding some convenient units of measure to apply 
to the achievement in question. If one wants to know 



^ 



I 



I 



NEED FOR DEFINITE MEASUREMENTS 155 

the length of a board or the width of the street, there 
are several convenient units with which to measure them. 
They may be measured in feet, inches, yards, meters, 
centimetrfrs, or other derived units. If one wants to know 
how heavy a thing is, there are well-defined units at his 
disposal such as the ounce, pound, gram, and kilogram. 
Temperature is measured in degrees, electricity in kilo- 
watt-hours, etc. 

"When one attempts to measure the handwriting of an 
individual, or how well he can draw or spell or how well 
he can read and cipher, there are no such convenient 
tmits at his disposal. One of the first problems to be 
solved, therefore, by those desiring to measure school 
achievements is to devise scales and units that may be 
used in measuring these products. In linear measure and 
measures of weight, the steps on the scale are the same, 
that is, the difference between 1 pound and 2 pounds is 
the same as the difference between 21 pounds and 22 
pounds, or between 48 and 49 pounds. The question 
immediately rises as to whether scales for measuring edu- 
cational products may be made and used as the above- 
mentioned scales are used in other sciences. 

What School Achievement Tests Measure. — The first 
thing to be noted is what a test measures. Professor 
Courtis has given us three terms which are helpful in bear- 
ing this thought in mind. These terms are: (1) capacity, 
or the original intellectual endowment of the educand when 
he enters school; (2) ability, which is capacity plus train- 
ing; and (3) performance, or that which the individual 
actually does on a given test. It is a rare case when one's 
ability is equal to his capacity. That is, the native endow- 
ment is such that, generally speaking, it might have been 
cultivated into more ability. It is also true that perform- 
ance is usually less than ability. It is a rare case when 
one is taking a test that all the ability is utilized. When 



156 EDUCATIONAL MEASUREMENT 

an eighth-grade boy takes a test in addition, for instance, 
and works as liard and rapidly as he thinks it is possibly 
he could, no doubt, add another problem to his credit if 
the reward offered were made large enough. In school 
achievement tests, we are testing performance and not 
capacity or ability. It should be noted that the word 
"performance" is used here in a different sense from 
that naed in discussing tests of intelligence. There it was 
used in a somewhat restricted sense, meaning activities 
which do not depend on the ability to read. 

Experimental Evidence to Show that School Marks Are 
Inadequate. — As was indicated in the introductory chap- 
ter, one of the first things that must he done to make 
measurements of educational products more nearly exact 
is to convince teachers that the old marks now being used 
are inadequate. It seems fair to assume that with an 
efficient measuring device or yardstick any number of 
competent persons measuring the same thing ought to get 
approximately the same results. It is not expected that 
the results will be exactly the same because exact measure- 
ments are impossible in any field. If, for instance, 116 
competent persons were given a measuring stick and asked 
to measure the length of a board the length of which was 
known to be somewhere between and 100 inches, and, 
if the results obtained varied from 28 to 92 inches, we 
would conclude that there was something wrong with the 
measuring device. If the measuring were done with a foot 
rule divided into inches, a variation of an inch or two 
Tvould be the maximum we would expect. 

Starch and Elliott investigated the accuracy with which 
116 teachers were able to measure the merits of a geometry 
examination paper. A facsimile reproduction of the 
examination paper was made and sent to each of the high 
schools included in the North Central Association of 
Colleges and Secondary Schools with the request that the 



NEED FOR DEFINITE MEASUREMENTS 157 

geometry teachers in the various schools mark it on a 
basis of 100 per cent. One hundred and sixteen teachers, 
in as many schools, complied with the request. The grades ^ 
ranged from 28 to 92 per cent. The conditions are 
analogous to those mentioned above in measuring the board. 
The supposition is that the teachers asked to measure the 
merits of the paper are competent persons or they would 
not have been employed to teach mathematics jn the high 
schools. Another factor that deserves special attention is 
the fact that a geometry examination paper is supposed 




53 33 so 



so S5 SO 



to be of such a nature that it may be graded with a much 
higher degree of accuracy than an examination in litera- 
ture, history, or reading, for instance. Figure li shows the 
distribution of marlfs as they were compiled from the re- 
ports of this test. Two of the marks were above 90 while 
one was below 30. Thirteen teachers gave the paper a grade 
of 75 per cent. Twenty-seven marks were above 80 while 
20 others were below 60. Considering a grade of 75 per 
cent as a "passing mark," 47 of the 116 teachera con- 
sidered the paper to have sufficient merit "to pass," while 




158 EDUCATIONAL MEASUREMENT 

69 thought the paper not vortby of a passing mark. This 
gives a range of more than 60 per eent. Two or three 
inches would be the maximum rariation we would expect 
in Ihe case of measuring the board above mentioned. 

The case of measuring the board would have been more 
nearly analogous tc that of measuring the examination 
paper if instead of using a rigid foot rule we had used a 
piece of elaatie. In that case it would aU depend on the 
t«DBion of the elasfie as to what results were obtained- It 
is as difficult even for the same teacher to grade a paper 
twice, with a sufficient time interval so ihat she does not 
remember her former grade, and give it the same score 
each time, as it would be to make two measurements of 
a board with a piece of elastic and get identical results 
because she forgets what the former tension of the elastic 
was. 

Tcachera' unaided judgments are as elastic as the piece 
of rubber, and the purpose of the new tests and standards 
is to give the marks some fixity so they will be more nearly 
constant. If, by these new standards, the variation can 
be reduced by even one-half, so that in the case of the 
geometry paper instead of having a variation in the scores 
of more than 60 per cent (from 28 to 92,) it may be re- 
duced to 30 per cent, or even less, a wonderful change for 
the better will have been wrought. 

Starch and Elliott not only investigated the ability of 
geometry teachers to mark geometry papers but made 
similar tests in history and English. In all three cases 
the variations were very great. The papers in each case 
were presumedly graded by experts in the various fields. 

Marking System Inefficient because It Does Not Indi- 
cate Progreasive Degrees of Merit If one were asked to 

grade a composition, one of the first things he would want 
to know would be what age, grade, or class of pupil wrote 
it. If told it was written by a pupil in the second grade. 



J 



NEED FOR DEFINITE MEASUREMENTS 159 

a boy six or seven years old, it might get a mark of 95 
or 100 per cent, but if told that it was written by an 
eighth-grade pupil, a mark of 60 per cent might be given. 
Now it is evident that the composition has the same merit 
whether written by a second-grade pupil or a college 
graduate. Instead of having one scale for grading com- 
positions the person grading the paper uses a different 
scale for every grade in school. A boy may get 95 per 
cent in his composition work from the first grade to the 
time he graduates from college. Since it is obvious that 
he can write a better composition in college than he can 
when he is only a second-grade pupil, there should be some 
way of expressing in definite units of accomplishment the 
advancement made. 

Definition of a Scale. — It is to correct this defect that 
educational scales are being devised. The word "scale" 
comes from the Latin word scala, which literally means a 
ladder, or a staircase, a long straight article divided into 
a series of equal steps and readily lending itself to use 
as a measure. In the physical sciences the scale reaches 
from the lowest to the highest. A scale for measuring 
weight reaches from the lowest, or smallest amount we 
desire to measure, to the highest, or greatest. A scale for 
measuring length or time reaches from zero to some upper 
limit as high as we desire to go. As used in education 
it is thought of as a linear rule extending from the worst 
to the best ; or from an educational product of least merit 
to one of greatest merit; from problems of least difSenlty 
to those of greatest difficulty, and so on. Measurement 
is in terms of relative merit, showing whether a given 
product or performance represents an achievement that is 
halfway along the scale from worst to the best or at a 
point 80 per cent of the distance from the zero point to 
the high end, or some other definite location. 

"We are apt to ignore the relative worth of things. "When 



160 EDUCATIONAL MEASUREMENT 

we attack a standard in school, we set absolute perfection 
as our own standard. We fail to consider the fact that 
the final increment of any subject is frequently obtained 
at an expenditure of energy out of all proportion to its 
worth. Scales will help in discovering what it costs 
in time, energy, and practice to reach the various standards 
set. We shall expect that curricula may be constructed 
on the basis of aims or objectives that are scientifically 
determined and are not the chance product of personal 
preference and opinion. The function of the scale is to 
take the score resulting from the test and interpret it in 
terras of relative merit. 

The test is analogous in many respects to the pea, or 
sliding weight, on the scale beam and is used to determine 
the exact location of the individual on the scale. Making 
a test too easy would not determine the position of an 
individual on the scale any more than setting the pea at 
10 pounds would determine the weight of a body weigh- 
ing approximately 100 pounds. 

Just as one estimates the weight of an article by its size 
or by lifting it and sets the pea at the estimated weight 
on the scale beam, so the test must be arranged so that 
the point reached on the educational scale will lie within 
the range of the test. If, for instance, all the problems 
are solved in a speed test in arithmetic before time is called, 
one would not know whether the upper limit of a child's 
ability had been reached any more than he would know 
the weight of a thing if he were to set the pea at 100 
pounds on the scale beam and did not move it above or 
below that mark to see ii the article would weigh more or 
less than 100 pounds. 

Progress from grade to grade cannot be measured 
accurately if the per cent system, as now used, is to be 
employed. The Thorndike Handwriting Scale, while in- 
efficient in many ways, possesses the essential character- 



J 



NEED FOR DEFINITE MEASUREMENTS 161 

istic of indicating degrees of merit from the lowest to 
almost perfection. The various specimens of handwriting 
start with Quality 4, which is recognized as handwriting 
but almost entirely illegible, and go to Quality 18, which 
is well on the way towards perfection. The quality of a 
specimen of handwriting is determined by sliding the 
specimen to be measured along the scale until that quality 
is reached which most resembles the sample in question. 
The age or grade of the writer has nothing to do with the 
quality. An eighth-grade boy and a first-grade boy may 
write specimens of exactly the same quality. 

An Ideal Scale IHust Have Equal Steps and Each Step 
Must Bear a Definite Relation to the Zero Point. — It was 
indicated in the introduction that our present marliing 
system follows no known rules of mathematics. A grade 
of 80 per cent does not mean that the product is twice as 
good as that given a grade of 40 per cent, and 75 per cent 
is not one and a half times as good as a grade of 50 per 
cent, because there is no zero point established to which 
these marks bear a definite relation. Our new educational 
scales are built on the same plan as the scales for length 
and weight in the physical sciences. In order to obtain 
this characteristic of having successive steps on the scale 
indicate progressive values increasing by equal amounts, 
the new scales are based on the theory of the so-called 
normal distribution of intellectual ability. This distribu- 
tion is represented graphically by a bell-shaped cui've, or 
it is like the cross-section of a pile of sand dumped from 
a cart. The following illustrations of a normal distribu- 
tion surface will help clarify our conception as to how the 
new educational scales are made. 

If one were to pluck aU the leaves from an oak tree 
and arrange them in a Ion;; line according to their lengths 
so that the shortest leaves would be at the left end of the 
line and the longest ones at the right, he would And that 



162 EDUCATIONAL MEASUREMENT 

he had very few exceptionally short leaves, a. great number 
of medium length, and very few exceptionally long ones. 
If these were represented by a surface of frequency, the 
diagram would be the bell-shaped curve mentioned above 
(see Fig. I, page 77). Now, if the number of leaves of 
various lengths were represented by the height of the curve 
above the base line, at the extreme left the curve would 
come very close to the base line because there are very- 
few leaves exceptionally short. As the leaves grow pro- 
gressively longer their number also increases, and the curve 
rises rapidly from the base line, first concave then convex, 
until the peak is reached. It then descends toward the 
base line making a symmetrical curve. 

If 10,000 adult males, ail belonging to the same race, 
were chosen at random and lined up in the order of their 
heights, the short ones at the left end of the line and the 
tall ones at the right, their distribution would approximate 
very closely that mentioned above in reference to the leaves. 
That is, there would be a few exceptionally short ones, a 
great many of medium height, and very few giants. The 
weights of individuals, the strength of their grip, and, in 
fact, most of their physical traits are found to obey the 
same law. 

The question then arises as to whether the intellectual 
traits of individuals distribute themselves in the same way. 
It has been found that if tests are given in any grade as 
the fifth, for instance, in the subject of spelling, a few 
exceptionally poor spellers are found, a large group of 
spellers with median ability, and a very small group of 
exceptionally good spellers. The same distribution is 
found if any considerable number of children in any grade 
are tested in any subject. In other words, the same law that 
operates in the physical and biological seienees is also 
operative in reference to intellectual traits. 

The problem of scientific scale building is further sim- 



NEED FOR DEFINITE MEASUREMENTS 163 

plified by the fact that mathematical laws concerning the 
characteristics of the surface of normal distribution have 
been most accurately determined, and by the application 
of those laws it is possible to locate and determine the steps 
on the different scales for school achievement. 

Now since it is found that intellectual traits distribute 
themselves according to a normal or symmetrical distri- 




Fiouiu: III. 

IN THE UNIVERSITY 

{oiler Johnson). 



bation, if one were to examine the grades ^ven by teachers 
and found they were distributed in some other way than 
according to a normal distribution of frequency, it would 
be perfectly legitimate to question the marking system. 

An investigation of this bind was made by P. W. John- 
son, principal of the University High School of the Uni- 
versity of Chicago of the grades given for the school yeans 
1907-8 and 1308-9. Figure III shows the distributiona 



154 EDUCATIONAL MEASUREMENT 

of the grades.^ It may be seen that the distribntion of 
grades here do not conform to a normal distribution sur- 
face. 

Another investigation which throws some light on the 
inadetjuacy of school marks is that made by Dr. P. J. 
Kelly. In 1913, Dr. Kelly made an investigation in four 
ward schools in Haekensaek, N. J. to determine the marks 
given sixth-grade children, and compared these marks with 
those received when Ihey went to a common departmental 
school for seventh grade work. A subject such as arith- 
metic, which was taught in the ward schools by four 
teachers, is now taught by one. That is, by the depart- 
mental plan one teacher had the arithmetic, another the 
language, a third the history, and so on. This gave an 
opportunity to cheek up the grades and find out if a "G" 
(Good) in one ward school meant the same as a "G" in 
another. Since all the ward schools were under the same 
management, being a part of one system, we would expect 
a "G" in one school to mean the same as a "G" in an- 
other. This condition, however, did not obtain. 

Kelly found that for work which the teacher in school 
"C" (one of the ward schools) would give a mark of "Q" 
(Good) in language, penmanship, or history, the teacher 
in school "D" (another ward school) would give less than 
a mark "P" (fair).* 

More Exact Measurements Will Make Education a 
Science. — The application of scientific measurements to 
school products is doing more to make education a science 
than any other contributing cause. It is giving education 
tools with which to work. In this field we are making 
wonderful strides. The soil is virgin; hence it will take 
educators a long time to put it on a plane with the other 

* School Renew, Vol. 19, pp. lS-24. 

' F. J. Kelly, Teachers' Marhs, Teaohere College Contribution 
Education, No, 66, 1913, p. 7, 



J 



[ 



NEED FOR DEFINITE MEASUREMENTS 165 

sciences. For instance, a few years ago a reading test 
seemed impossible; to-day, we have mastered the distinc- 
tion between oral and silent reading. We have good 
methods of measuring some of the more common types of 
deficiencies, and we know the rate of progress which is 
normal in the more obvioas phases of interpretation. The 
advantages of tests are in the same line as the advantages 
of the thermometer over saying, "The heat is stifling," "It 
is very hot," "It is boiling hot," etc.; or of saying, "He 
is the tallest person I ever saw," or "the biggest in the 
state of Massachnaetts. " 

It may not be too mnch to predict that some day concen- 
tration of attention, ability to attack various kinds of prob- 
lems, clearness of insight, power of inference in various 
fields, and other abilities wjli be measured. An educational 
product, as a composition, is usually a eomples, and its 
measurement is more like measuring a house or an elephant 
than measuring a length or a volume. It must not be 
expected, therefore, that a thing may be completely 
described quantitatively when it is measured once. 

Supervision Improved as Ability to Measure Increases. 
—Standard tests will greatly improve our supervision. 
Our children will be more rapidly and effectually taught. 
Inefficient teachers will no longer be able to hide behind 
our ignorance. Dubious aims of education and aims too 
remote to be effective to the general practitioner will be 
replaced by goals that are in sight, by a motive which is 
dynamic and energizing, and by an appeal which will 
spur pupils on to a greater effort. The pupil wants to 
know how he is getting along. Ho wants to know where 
he stands in reference to an impartial standard. He wants 
to know if his performance eaeh succeeding day and week 
brings him a little nearer the coveted goal. The teacher 
is eager to know what degree of success her efforts have 
won as measured by the quality of the work done by her 



J 



166 EDUCATIONAL MEASUREMENT 

pupils. It must be borne in mind, of course, that an edu- 
catioDal product is a complex. It is the resultant of a 
great many causes, of which the school is but one. The 
home, the native capacity, the amount of study done by 
the pupil, the influence of the street, the playground, the 
state of health of the edacand, and many other factors 
enter into this complex. To isolate and measure the school 
factor is a difficult task to perform. One must be very 
careful about drawing conclusions too hastily when schools 
are compared. Suppose for instance, school A were com- 
pared with school B in the four fundamentals of arith- 
metic and the median score for school A was three prob- 
lems more than school B. Teachers must not draw the 
conclusion that the difference was caused exclusively by 
better teaching in school A. It may be that more time 
was devoted to the four fundamentals in arithmetic in 
school A than school B. Or the native capacity of chil- 
dren in the former school may have been much better 
than that in the latter; they may have studied harder; 
they may have received more help and encouragement from 
home ; the general stattis of health may have been better 
than in school B. Along with these factors may liave been 
the fact that school A had better teachers and better teach- 
ing methods than school B. Or, the opposite may have 
been true. Teaching methods cannot be compared in this 
way. If one wanted to compare the merits of two methods 
of teaching a certain subject, as, for instance, the auditory 
and the visual methods of teaching spelling, the teaching 
factor may be isolated and measured with a fair degree of 
accuracy in the following manner : Take two classes from 
the same sciool, or from different schools, which appar- 
ently have the same spelling ability as measured by some 
spelling scale such as the Ayres Spelling Scale. Combine 
the study period with the recitation period and have one 
class study spelling by the visual method, where the teacher 



NEED FOR DEFINITE MEASUREMENTS 167 

writes the words on the blackboard or has some other 
way of presenting them visually. In the other clasa the 
same words are to be studied, but all work is to be done 
orally and the children are not to see the words either 
in script or print for purposes of study. No work is to 
be done on the spelling lessons outside the classroom. At 
the end of one month a teat may be given and the merits 
of the two methods detennined. These conditions would 
be about as nearly controlled conditions as one could get 
in ordinary school-room procedure. Even here the difEer- 
enee in native capacities of the two classes, the difference 
in the two teachers, if two were employed, the difference 
in application of the pupils and many other factors would 
condition the results to some extent. 

Our Educational Scales Have Been Subjective, — The 
problem for scicntiiic education is to make our eduea- 
tionai scales objective and universal. They must be so 
constructed that there can be no misunderstanding about 
them when they are placed in the hands of competent 
teachers. Educational scales make it possible to define 
educational products far more precisely than without 
them. Just as in the physical sciences it is far more 
nearly exact to say the temperature is 106 degrees 
Fahrenheit than it is to say, "It is the hottest day I ever 
saw," or "The day is boiling hot," or "It is an awful 
hot day," so in education we are refining our cruder 
terms and making them more definite. When an indi- 
vidual says it is the hottest day he ever saw, no one but 
the speaker knows what is meant, but when he says it 
is 106 degrees Fahrenheit any intelligent individual in 
any part of the world understands exactly what is meant. 
The expression, "the hottest day I ever saw," is as 
definite, however, as the 85 per cent the teacher gives 
boy in penmanship, because no one knows but that 
teacher what it means. Before Thomdike made his hand- 



J 



168 EDUCATIONAL MEASUREMENT 

writing scale educators were in the same condition with 
respect to handwriting as scientists were in respect to 
temperature before the discovery of the thermometer. In 
that day it was not possible to measure ordinary tem- 
perature beyond the cold, cool, warm, hot, and very hot, 
of subjective opinion. 

We can easily measure the salaries of teachers because 
we have a scale of money price. We can measure the 
amount of time given by teachers because we have the 
best scale that the world knows, the scale of time divided 
into seconds, minutes, hours, etc. The most abstract thing 
in the world is a scale for length, weight, time, or units 
of temperature. As little as we care about scales they; 
are among the most important things in the world. The 
skill and courage of the most daring seamen that ever 
traveled the seas, millions of them put together, do not 
do as much for the practice of navigation as the mariner's 
compass which is simply a scale for telling direction. 
While tests limit the "spread-out-ness" of our units ol 
measurements, they do not give us a definite point on 
the scale. They do, however, give us narrow and definite 

\ boundary lines within which the measures lie. 
'> Tests Do Not Indicate the Cause of Conditions. — A 
' school achievement test, or any other kind, does not indi- 
cate what brought about the conditions found. It simply 
says that a certain state of afEairs exists. When a physi- 
cian's thermometer shows a temperature of 103° it does 
not indicate, in the least, whether the high temperature 
is caused by typhoid fever, influenza, diphtheria, or what 
not. It simply says that the patient's temperature is 103°. 
This is a result of perhaps many causes, and fui'ther 
diagnosis is necessary to determine what brought about 
this condition. In exactly the same way a score made 
in any subject is the resultant of many causes, of which 
teaching may be one. Teachers must use caution in 



NEED FOR DEFINITE MEASUREMENTS 169 

assigning conditions to de£tiite causes unless it is 
absolutely known that the condition was brought about 
hy the cause assigned. 

How Standard Tests Differ from Ordinary Examina;- 
tions. — Perhaps one of the best ways to show the differ- 
ence between a standard test and an ordinary examina- 
tion is to note, in a general way, the steps in making a 
standard test. To the uninitiated, school achievement 
tests look so much like ordinary examinations that 
teachers often wonder what the advantages of the stand- 
ard tests are over the examinations. 

The superiority of the standard test over the ordinary; 
examination may be shown best, perhaps, by indicating 
the problems in making a standard test. The procedure 
is somewhat as follows; The first thing to be determined 
is the kind of a test desired ; that is, shaU the test cover 
several phases of the subject, or onet Shall it be a 
rate test, telling how much a pupil ean do in a given 
time, or shall it be a difficulty test answering the ques- 
tion "How hard a problem ean a child solve?" Or shall 
it be a quality test answering the question "How well 
can a child do a given task?" "When the typ& of test 
has been chosen, then those making the test must decide 
upon the following points: Shall the test cover several 
parts of a subject as, for instance, the four fundamentals, 
common and decimal fractions, and percentage in arith- 
metic, or shall it be diagnostic and cover quite completely 
just one phase of the subject, as addition or division? 
Or, if the subject is language, shall the test cover in a 
general way all parts of speech, or shall it cover just 
verbs or pronouns? This having been decided the next 
point to determine is the principles on which the test 
questions shall be based. For instance, if a pronoun test 
is being made, shall the choosing of the test questions 
bear some definite relation to the type of errors people 



170 EDUCATIONAL MEASUREMENT 

make in the use of the pronoun forma, or shall somethiag 
else be taken as a basis for choosing the questioES? If 
more errors are made in the use of some pronoun forms 
than in others, shall there be more test questions on these 
particular forms, assuming that the test questions have 
been selected on some such basis as the above? 

The next questions to decide are how many questions 
or parts shall go in the test and approximately how much ♦ 
time should it take to do the test. The number of ques- 
tions or divisions is determined in part by the subject 
matter itself, and in part by arbitrarily fixing the length 
of the test. The approximate length of the test having 
been determined, the next thing is the actual drafting 
of the questions or parts and the submitting of these 
parts to a representative group of children usually from 
2,000 to 5,000 in number. This is called tlie preliminary 
test upon which the standardized test may eventually be 
based. It may be, however, that the preliminary test 
will show that the mode of attack is not good and the 
whole thing must be "scrapped" and a new mode of 
attack adopted. Assuming that the mode of attack was 
satisfactory and that the preliminary test indicates that 
the desired information may be obtained by the methods 
adopted, the one making the preliminary test elimioatea 
unsuitable material, modifies some of the items in the 
light of his experience, and brings out the test. He 
usually obtains some tentative standards ; but the limita- 
tions of his time and money usually prevent his carrying 
this phase of the work to a satisfactory conclusion. There 
is, therefore, no test which does not require, after its pub- 
lication, a thorough trial in order to set up norms of 
performance. 

For many years workers in the field of educational 
measurements have been devising tests until now we have 
several for most of the subjects in the elementary school. 



J 



NEED FOR DEFINITE MEASUREMENTS 171 

Many of the older testa have passed throogh their pre- 
liminary stages and may be said to be satisfactorily 
standardized. In some of the tests little attention has 
been given to questions of validity and the reliability of 
the measures which the tests yield. The availability of 
a number of testa for each of the common school sub- 
jects makes urgent a scientific determination of the 
validity and reliability of the respective tests in order 
Ihat one may have information on which to base a 
rational selection. 

In contrast with this long and tedious process, the 
traditional classroom examination consists of tasks of 
varying units of difficulty, worked at for varying amounts 
of time, and producing results of varying degrees of 
quality. The examinations do not measure any one thing. 
They measure a conglomerate of achievements condi- 
tioned by the three factors of quality of product, diffl- 
eulty of task, and time consumed. Of course, standard- 
ized tests likewise measure a complex, but the methods 
of their derivation, the controlled conditions under which 
they are given, and the norms established make them 
far more reliable than the ordinary classroom examina- 
tion. 

Standardized tests are devised according to scientific 
procedure. The formulation of the questions or state- 
ments of the ordinary examination are based on the judg- 
ment of only one teacher and it often happens that the 
teacher does not give them very careful consideration. 
For the ordinary examination there are no standards. 
With the standardized test we have a statement of what 
scores the pupils of the several grades should make. The 
great problem in scientific education is to construct 
objective and universal scales, about the use of which 
there can be no misunderstanding when they are placed 
in the hands of competent teachers. One of the main 



j^ 



172 EDUCATIONAL MEASUREMENT 

improvements due to the use of a definite set of stand- 
ards is the elimination of the errors of prejudice. 

It is to remedy these shortcomings of onr marking 
systems that the scientific movement in. education has 
devised tests and scales. Physical measurements require 
the thing to be measured to possess the quality of con- 
sistency and to remain constant while it is being 
measured. This is a fundamental necessity for logical 
thinking about measuring, counting, or enumerating. 
The things counted must all be of this same category. 
The thing measured must be constant in its character or 
composition so that one unit of it will be equal to any 
other unit of it. 

The process by which the essential characteristic of 
consistency is obtained in educational measurements is 
the one used in physical measurements. It consists of 
distinguishing the possible controlling varying factors, 
devising means of holding all of them constant save one, 
and measuring that one. This is the law of the single 
variaile. 

A standard test generally has been given to pupils 
of many schools, which makes it possible to compare the 
scores made in one school with those made in another. 
Comparisons, however, must be made with care since 
many factors other than the teaching factor enter in. 

ninstrating the Law of the Single Variable. — The 
operation of the law of the single variable may be illus- 
trated in the methods used by Burgess in designing a read- 
ing test.* The aim of this test was to find out how much 
printed material, of a given level of difficulty, & child 
could read "well enough for all practical purposes." The 
attempt was to devise a test in which the child could 
readily succeed if he read well enough to grasp the impor- 



J 



NEED FOR DEFINITE MEASUREMENTS 173 

tant thought in each section, and in which he could not 
succeed at all unless he did comprehend each important 
thought. This was the interpretation which was put upon 
the phrase reading **good enough for all practical pur- 
poses." 

The first step was to determine what the conditioning 
factors were in reading and choose one of these factors 
for measureme];Lt. The following is a list of 25 factors 
which the author of the test dealt with in measuring silent 
reading, with the disposition she made of them:*^ 

To he measured: 

Amount child can do in given time 
To he eliminated: 

Complex thought 

Abstract thought 

Technical thought and language 

Catches 

Puzzles 

Accidental leads 

Demands for spatial imagination 

Irrelevant dramatic appeal 

Ability to reproduce 

Ability to remember 

Ability to reason, or infer 

Involved style 
To he held constant through the test: 

Memory span requirements 

Attention span, multiple 

Difficulty of action demanded 

Time required for complying with instructions 

Vocabulary difficulty 

Sentence structure 

Word arrangement 

Amount of material to be read 

Uniformity of print 

Uniformity of space relations between pictures and print 

Case of finding place on paper 

Interest and corresponding effort on part of child 

»/Wi., pp. 37-38. 



i 



174 EDUCATIONAL MEASUREMENT 

Of coarse she conld not entirely eliminate all the factors 
listed under the heading "To be eliminated," neither 
could she hold entirely constant the last group of factors 
in the above list. Of tliis sort are the factors: Amount 
of material to be read; difficulty of action demanded; and 
others wliich could not possibly be kept entirely constant. 
The purity of the final score, which was "the amount a 
child could do in a given iime," was determined by the 
degree to which she was able to control some of these 
factors and eliminate others. Unless this could be done 
to a marked extent, her final scores would be a conglom- 
erate, or a measure of many things instead of one, which 
measure would, in fact, be a measure of a complex. 

As was said above, the essential characteristic of con- 
sistency may be obtained in educational measurements only 
by distinguishing the possible controlling varying factors, 
devising means for holding all of them constant save one, 
and measuring that one. This is the law of the single 
variable. 

The reason why marks received by pupils in arithmetic 
and other subjects usually do not actually record their 
abilities and the value of their achievements is that they 
do not measure different amounts of the same thing. 

The law of the single variable is a principle of measure- 
ment taught in the earliest school years and increasingly 
recognized in the comparative judgments of every-day life. 

The tests designed to measure reading ability usually 
measure in addition other and different abilities. 

Time of Day When a Test Should Be Given.— There is 
undoubtedly a best time jn the day to give a school achieve- 
ment test. If a high score is the thing desired, other things 
being equal, the morning, when the children are fresh, is 
better than the afternoon. Fewer errors are liable to be 
made in the early part of the school day. 

There is another problem, however, to be taken into 



NEED FOR DEFINITE MEASUREMENTS 175 

conBideration. It is the qaestion as to how near we want 
to make the school conditions approach those in life oot- 
side the school. In our regular work-a-day life we cannot 
5hoose just when we shall do the difficult task. For in- 
stance, a railway passenger agent may have been working 
bard all day, and five minutes before six o'clock in the 
afternoon twenty-five people may rush up to the ticket 
window and demand tickets to some distant points. It 
may be that the train is already on the track ready to 
depart in three minutes. Those desiring tickets are ex- 
cited because they fear the train will depart before they 
secure their transportation. Under conditions like these the 
ticket agent must answer their many questions, give them 
the proper tickets, answer the telephone, make change 
correctly, and attend to his other duties as ticket agent 
and telegraph operator. The fact that he has worked hard 
all day and is pretty well "fagged out" before the rush 
comes will not excuse him for any errors he may make as 
to the information he gives, the change he makes, or the 
tickets he sells. He is held strictly accountable for every 
act. It would have been to his advantage to have had 
the rush in the early part of the day when i 
and better able to cope with the situation. 
compelled to accept it just as it happened to c 
criticism of our schools is that they are so difl 
extra-school life that children prepared in them are not 
able to cope with life's situations. ,^ 

It is true that the children may be nervous and more 
apt to "go to pieces" in the latter part of the school day 
than in the forenoon, but that may be one of the reasons 
for giving the test to see how well they can perform under 
conditions which closely approximate those of extra-school 
life. 

The child who hasn't been trained to withstand condi- 
tions of this kind would probably "go to pieces" when., , 




176 EDUCATIONAL MEASUREMENT 

he reaches adult life and attempts to fill a responsible 
position. 

It is not maintained that one should pick out that part 
of the school day when the children are pretty well fagged 
out and give the test at that time, but, if that part of the 
day happens to be the most convenient time to give the 
testj it may not be wise to postpone it simply because the 
children are not as fresh as they are at some other time. 

Number of Times a Test Should Be Given. — A test is 
given to a class for the first time to find out the eonditioa 
of the class in reference to any subject. It indicates that 
the class has reached a certain state of proficiency in 
reference to that subject. It gives tlie teacher a point of 
departure. It shows the strong and weak points of the 
class. It indicates where work should be done. It too 
often happens, however, that teachers and superintendaits 
give a test to find out the condition of a class in reference 
to a certain subject and then never act on the information 
gained. Such procedure is like a physician pronouncing 
a case typhoid fever and then making no effort to treat 
the case. The test should indicate what needs to be done. 
Teachers should act on the information given and test 
again to see what improvement has been made. There are 
some tests which cannot be repeated at short intervals be- 
cause the children would remember enough from the first 
test to vitiate the results if it were given a second time 
with a short time interval. It is usually easy to avoid this 
diffietilty, however, because two or more tests of equal 
difGculty are usually made in reference to any one subject. 
The practice effect is thus avoided. There are certain tests, 
however, such as the Courtis Standard Research Tests in 
lArithmetic, Series B, which may be repeated at short inter- 
vals without the practice effect vitiating the results. It 
would be a rare case for a pupil to remember the answers 
to any of these problems more than a day or two. Owing 



NEED FOR DEFINITE MEASUREMENTS 177 

to the nature of the Monroe Silent Reading Tests, however, 
the children might remember the correct answers to them 
for several months. 

We give tests for the same reason that a merchant in- 
voices his goods. He cannot determine his profits until 
he makes at least two invoices. Neither can a teacher 
determine the progress of her pupils until she makes at 
least two tests, the first one to determine a point of de- 
parture, and the second, or succeeding ones, to determine 
the progress made. Without the tests the teacher is simply- 
guessing at the progress made, bat she never really knows 
to what degree her efforts have been rewarded. 

How Standard Tests Are Helpful in Improving Instruc- 
tion. — Standard tests render assistance to the teacher in 
three ways. (1) Since the test has been constructed after 
a careful analysis and survey of the field, it gives a teacher 
a list of things that the pupil should be able to do. This 
is well illustrated by the Ayres Spelling Scale, which eon- 
tains the 1,000 most frequently used words, and Monroe's 
diagnostic tests in arithmetic, which give a list of the 
significant types of examples in certain fields. (2) Since 
the tests are standardized, the teacher may know just what 
scores pupils oaght to make. She is, therefore, given a. 
definite objective aim to strive for in her teaching, an aim 
which the pupil can understand. The advantages of a 
definite standard are obvious. (3) Tests furnish the 
teacher with information concerning the abilities of her 
pupils. They point out phases of strength and weakness. 
With this information at hand she can plan instruction 
which will he more efficient since it will meet specific needs. 

A test cannot be used properly unless it is accompanied 
by a. complete set of instructions for giving it, for scoring 
the test papers, and tabulating the scores. For tabulating 
the scores a special class record sheet is usually provided. 

The value of standard tests is realized through the use 



J 



178 EDUCATIONAL MEASUREMENT 

that is made of the scores. They mnat be interpreted in 
tei-vis of tke needs of the pujnls for instruclion. This 
means doing more than determining whether the scores 
are above standard, at standard, or below standard. It 
means doing much the same sort of thing the physician 
does when, after he has ascertained the patient's pulse 
rate, temperature, and other symptoms, he prescribes treat- 
ment. 

The use of standard tests should result in the improve- 
ment of instruction, and this is accomplished by means 
of the information which the tests yield concerning the 
abilities of tlie pupils. Growing children do not develop 
skill by instruction or personal exertion on the teacher's 
part. They develop best when they are inspired to vol- 
untary effort. Mere repetition does not develop skill; it 
is repetition accompanied by a conscious desire to improve 
that brings results. Credit should be given for growth, 
not for the number of lessons completed. 

It is easy to get a child to try once, but he will not keep 
on trying unless his efforts bring success. Standard tests 
automatically set for each child a task within his reach. 
He knows when he gets it done. 

Speed tests are timed in order to "speed up" the chil- 
dren's work. Teaching a child to concentrate and to work 
efficiently does not mean prodding him to hurry. Speed 
is to be acquired by study and practice, not by special 
effort. The amount of work done in a given time is merely 
a symptom which may indicate to the teacher whether or 
not the child has studied the lesson sufficiently. 

It is only by measuring the initial ability of children 
in the fall and the final ability in the spring that a teacher 
may know the progress made. 

What Kind of School Achievement Tests la Most 
Important? — Much discussion has been carried on as to 
the relative importance of various types of tests. It ia 



NEED FOR DEFINITE MEASUREMENTS 179 

sometimes argued that the Courtis Standard Research 
Tests in Arithmetic, Series B, for instance, are not as 
valuable as other tests in arithmetic dealing with the more 
complex processes. To debate whether the fundamentals 
in arithmetic are more, or less, important than the more 
complex processes is analogous to debating which is the 
more important, the foundation of a house or the super- 
structure upon it. One cannot exist without the other. 
The four fundamentals in arithmetic constitute the foun- 
dation upon which other more complex processes are built 
and are absolutely necessary to the computation of these 
complex problems. The point is that we need tests in both 
fields. Tests in the fundamentals will not give informa- 
tion relative to the higher and more complex processes, 
and tests in the complex processes give information only 
indirectly as to the fundamental processes. 




THE CLASSIFICATION OF SCHOOL ACHIEVEMENT TESTS AND T 
FUNDAMENTAL, PK1HCIPU;3 FOK DESIGNING THEM 

School achievement tests show many lines of cleavage. 
Just aa it is possible to classify school subjects as sciences, 
arts, and volitions, or as formal subjects, content subjects, 
and expression subjects, so it is possible to classify tests 
and measurements into many groups depending upon the 
particular features one has in mind when the classification 
is made. In order to bring out clearly some of the more 
basic principles in designing tests we shall note some of the 
classifications. 

Diagnostic vs. General Tests. — One of the first problems 
that confront those making a test is whether or not the 
test shall be diagnostic. By a diagnostic test we mean one 
which furnishes a separate measure for each specific ability 
in the field of the test, or at least, of the important abilities. 
Scientific investigation baa shown that in a subject such 
as arithmetic there is not simply one ability, but a large 
number of abilities. A pupil may be good in addition, 
poor in subtraction, and weak in multiplication. It is a 
rare case when a pupil is equally proficient in the several 
abilities. Each school subject, therefore, includes a num- 
ber of abilities which are specific or distinct from each 
other to a considerable degree. The degree to which a 
test is diagnostic depends chiefly upon the amount of sub- 
ject matter in that particular field. The Charters Language 
Tests for Pronouns, for instance, are diagnostic in that the 



SCHOOL ACHIEVEMENT TESTS 181 

number of pi'onouns is limited and it is possible to give 
questions and examples using each of the pronouns in 
almost every conceivable way in which they are used in 
oral and written composition. Such a test may give a 
complete diagnosis of the child's language abilities in the 
use of pronouns and any shortcomings may be quickly 
located and corrected. 

A general test is, in a sense, opposite to a diagnostic test. 
It yields average or composite measures of a child's ability 
in the subject matter in question. It may be illustrated 
by the miscellaneous language test devised by Charters or 
by the Courtis supervision tests in arithmetic. The latter 
is a single test covering the entire field of all the operations 
with integers. The test gives tlie pupil's general or average 
standing. 

Grade norms cannot be used to make individual diag- 
nosis. But we can see by tliem which children are below 
and which are above the level that they should attain in 
their grade. Grade norms will not give administrators 
what they most need to know, namely, which children have 
progressed at the rate normal for tbeir age and native 
capacity, and which are performing at their maximum. 

Degree to Which Tests Are Diagnostic — The diagnostic 
characteristics of tests range all the way from zero, where 
the test is so general that speeifie abilities are completely 
lost sight of, to tests which may be said to be 100 per cent 
diagnostic. "We may illustrate this fact by examining some 
of the handwriting scales. Both the Thorndike and the 
Ayres handwriting scales may be considered almost zero 
as far as measuring specific abilities is concerned. "When 
these scales are used in measuring handwriting the sample 
to be measured is moved along the scale until the quality 
is reached that most resembles the sample to be measured. 
The specific characteristics of the sample are not taken into 
account. When the score is made up, it is a record of the 



184 EDUCATIONAL MEASUREMENT 

two of the abilities. This method was abandoned becanse 
it would require a choice among those parts for which the 
correlation is high and then among those where the correla- 
tion is not so high. This would resolve itself into an 
arbitrary choice which was the very thing he was attempt- 
ing to avoid. 

A second plan which suggested itself was to submit the 
list to a representative body of teachers and allow them 
to choose what phases should be measured. This plan was 
objected to on the ground that it was not known in advance 
how many elements must be chosen and it was doubtful if 
mutually exclusive points could be obtained in this way. 

The plan finally adopted for determining what char- 
acteristics should be used in making a score card for writ- 
ing was one which grew out of experience with the card 
itself together with the arbitrary choice of the author. A 
number of students were asked to grade samples of hand- 
writing each week with the crude score card with the idea 
of determining what points should be used in order to 
give a complete account of writing. The students were 
cautioned to watch for points in the writing which were 
not covered by the card that they were using. In the light 
of the experience thus gained it was found that writing 
might be rather completely described under nine headings. 
These are: (1) spacing of letters; (2) spacing of lines; 
(3) spacing of words; (4) slant; (5) size; (6) alignment; 
(7) neatness; (8) heaviness; and (9) the formation of 
letters. Neatness, which is, in part, a function of other 
elements, was finally included to take account of such 
points as blotches, carelessness, and retracing. 

Many of these headings were subdivided in order to 
increase the diagnostic value of the scale. The formation 
of the letters, for instance, was subdivided into: (a) parts 
omitted; (fe) letters not closed; (c) parts added; (d) 
smoothness; (e) general form. 



1 



SCHOOL ACHIEVEMENT TESTS 



I 



I A diagnostic teat of a different nature has been designed 
by Dr. Judd and was first used in the Cleveland survey. 
This is a series of 15 arithmetic tests, each composed of 
different types of examples. This series of tests has been 
called spiral becaase the abilities of the pupils are 
measured on successive levels of diificulty. The series is 
diagnostic rather than general. It yields measures of rate 
and accuracy with which the pupils do certain types of 
examples. The advantage clabned for the spiral method 
is that it offers a means for distinguishing errors due to 
accident from errors due to ignorance or incapacity. 
Errors of the latter sort will recur at regular intervals 
and may be readily recognized. Another argument for the 
spiral tests is the fact that time will not permit as many 
problems in a particular field as is desired 80 that each 
distinct mental operation may be thoroughly tested; 
hence, the device of a spiral arrangement, by which several 
related operations are combined in the same test in cyclic 
order. 

The Courtis Standard Research Tests, Series B, are 
general tests in the field of each operation but diagnostic 
to the extent that they give information for each of the 
four fundamental operations; addition, subtraction, mul- 
tiplication, and division. 

Formal Testa and Reasoning Tests. — Tests may be 
classified, from the standpoint of the kinds of mental proc- 
involved, as formal tests and reasoning tests. For 
measuring skjll or automatic processes we use the former 
type. These tests measure immediate specific and prepara- 
tory outcomes of school training. The Courtis Standard 
Eesearch Tests, Series B, are illustrations of formal testa. 
On the other hand, we may design tests to measure general- 
ized outcomes of school training. The Stone Reasoning 
Test in arithmetic may be classed as an example of this 



I 



I 



^and- 



184 EDUCATIONAL MEASUREMENT 

two o£ the abilities. This method was abandoned becaose 
it would require a choice among those parts for which the 
correlatioo is high and then among those where the correla- 
tion is not BO high. This would resolve itself into an 
arbitrary choice which was the very thing he was attempt- 
ing to avoid. 

A second plan which suggested itself was to submit the 
list to a representative body of teachers and allow them 
to choose what phases should be measured. This plan was 
objected to on the ground that it was not known in advance 
how many elements must be chosen and it was doubtful if 
mutually exclusive points could be obtained in this way. 

The plan finally adopted for determining what char- 
acteristics should be used in making a score card for writ- 
ing was one which grew out of experience with the card 
itself together with the arbitrary choice of the author. A 
number of students were asked to grade samples of hand- 
writing each week with the crude score card with the idea 
of determining what points should be used in order to 
give a complete account of writing. The students were 
cautioned to watch for points in the writing which were 
not covered by the card that they were using. In the light 
of the experience thus gained it was found that writing 
might be rather completely described under nine headings. 
These are: (1) spacing of letters; (2) spacing of lines; 
(3) spacing of words; (4) slant; (5) size; (6) alignment j 
(7) neatness; (8) heaviness; and (9) the formation of 
letters. Neatness, which is, in part, a function of other 
elements, was finally included to take account of such 
points as blotches, carelessness, and retracing. 

Many of these headings were subdivided in order to 
increase the diagnostic value of the scale. The formation 
of the letters, for instance, was subdivided into: (o) parts 
omitted; (6) letters not closed; (c) parts added; (d) 
smoothness; (e) general form. 



[SCHOOL ACHIEVEMENT TESTS 185 

& diagnostic test of a different natore has been designed 
Dr. Judd and was first used in the Cleveland survey. 
This is a series of 15 arithmetic t&sts, each composed of 
different types of examples. This series of tests has been 
called spiral because the abilities of the pupils are 
measured on successive levels of difficulty. The series is 
diagnostic rather than generaL It yields measures of rate 
and aceui'acy with which the pupils do certain types of 
examples. The advantage claimed for the spiral method 
is that it offers a means for distinguishing errors due to 
accident from errors due to ignorance or incapacity. 
Errors of the latter sort will recur at regular intervals 
and may be readily recognized. Another argument for the 
spiral tests is the fact that time wiE not permit as many 
problems in a particular field as is desired so that each 
distinct mental operation may be thoroughly tested; 
hence, the device of a spiral arrangement, by which several 
Lxelated operations are combined in the same test in cyclic 
|iorder. 

The Courtis Standard Research Tests, Series B, are 
Ig^eral tests in the field of eacli operation but diagnostic 
Wto the extent that they give information for each of the 
■ four fundamental operations; addition, subtraction, mul- 
■tiplication, and division. 

Formal Testa and Reasoning Tests. — Tests may be 
■elassified, from the standpoint of tlie kinds of mental proc- 
i involved, as formal tests and reasoning tests. For 
measuring skjll or automatic processes we use the former 
B-'toT^. These tests measui'e immediate specific and prepara- 
Itory outcomes of school training. The Courtis Standard 
Besearch Tests, Series B, are illustrations of formal tests. 
a the other hand, we may design tests to measure general- 
Eed outcomes of school training. The Stone Reasoning 
Test in arithmetic may be classed as an example of this 
ind. 



1S6 EDUCATIONAL MEASUREMENT 

Bate Tests and Development Tests. — Another line of 
cleavage allows us to divide the tests into rate tests and 
development tests. Courtis and Rugg' especially use this 
terminology, Rugg makes three distinctions between these 
two kinds of tests. (1) Rate tests are distinguished from 
the development teats in that the latter make use of the 
"time" factor only incidentally or not at all. That is, the 
students are given practically all the time they need to do 
the test and their scores are reckoned only slightly, if at 
all, in terms of time. (2) The rate test differs from the 
development test in that the latter is made up of all kinds 
of subject-matter ranging from the purely formal and 
automatic material on the one hand, to complicated reason- 
ing problems on the other. By rate tests we have in mind 
those types of tests in which "the ability involved in the 
working of any one problem is roughly the same as that 
involved in the working of any other, "^ Of course, the 
ability involved in the solution of the various rate problems 
is not exactly tlie same as that involved in the solution of 
others in the test; bat the same general type of mental 
processes is involved. (3) A third distinction is based on 
the organization or arrangement of the parts of the testa. 
In the rate tests one of two plans is followed. Either 
problems involving the same mental processes are grouped 
together in one test, or there is combined in one test a 
group of problems involving very closely related abilities. 
"The difficulty of each example in the test has been deter- 
mined and the examples have been arranged in terms either 
of a rotating or a cycle principle, or all problems of the 
same difficulty have been put together." * In the develop- 
ment test the difficulty of examples has been determined 
as in the case of the rate test, and the examples are 



^Siyientific Method in the Recoitslruelion of NirUh-GTode Malhematifa, 
Supplementary Educational Monographs, Vol II, No. 1, 1918. 
* Ibid., p. 66. * Ibid., p. 66. 



SCHOOL ACHIEVEMENT TESTS 187 

arranged in order of increasing difficulty. Different kinds 
of abilities are measured in the development test. 

Quality, Difficulty, and Time or Amount Tests. — ^In 
general it may be said that the standard educational meas- 
urements fall into three clearly defined groups according 
to which of the three variables we seek to measure. They 
are tests and scales for quality of product, for difficulty 
reached, and for amount done. 

1. QuaUty tests attempt to answer the question. How 
well can a pupil perform a certain task? Tests in com- 
position, drawing, and handwriting are primarily quality 
tests. They are not scored on a basis of right and wrong ; 
but there are all degrees of merit from the lowest to what 
might be called i>erfection. In writing, for instance, there 
is no right or wrong. The merit simply ranges from less 
good to more good through a continuous series of degrees 
of quality. 

Reading is a classroom activity which does not readily 
lend itself to measurement by means of scales of quality. 
One reason for this is that it does not result in a tangible 
objective product which can be scrutinized and measured. 
Another reason is that quaUty in reading is an elusive 
thing which varies not only with different people but with 
the same person from moment to moment as he reads. 

2. Difficulty tests attempt to answer the question. How 
hard a task can a child do? They are measured in terms 
of right and wrong. The question may be. How hard a 
word can a child spell? How difficult a problem can a 
child solve in arithmetic? In scales for difficulty, the 
variable which is measured is the difficulty of the work 
which the child can do. The difficulty of successive tasks 
is carefully increased and controlled and the child is al- 
lowed to overcome it if he can. The quality of his work 
must be high enough to be considered ** right." 

The commonest of all classroom questions is probably that 



188 EDUCATIONAL MEASUREMENT 

which relates to the diffienlty of the work which the child 
can do correctly. The usual method which has been 
adopted to answer these questions is to prepare a series of 
tasks carefully graded in difficulty. Those near the be- 
ginning of the series are so easy that almost any child 
in the group can do them. As the series progresses the 
questions become increasingly more difficult. In scales for 
difficulty the amount of time allowed for the test should 
have no effect on the score. The independence of time 
must hold, not only for most difficult problems near the 
end of the series, hut also for easy problems. There are 
some types of difficulty problems that a child can answer 
correctly at once or not at all. Spelling illustrates the 
point. If a child cannot spell a word immediately, no 
amount of time will aid him in his efforts. Spelling and 
arithmetic, and, less definitely, geography, history, and 
grammar constitute appropriate subject-matter for diffi- 
culty tests. Some of these are informational subjects in 
which, by the common verdict of society, the iafonnation 
is only valuable if it is accurate and correct. A type of 
handwriting that is somewhat inferior to another sample 
may be of almost equal practical value. The same cannot 
be said of spelling, arithmetic, history, geography, or 
grammar. Classroom products in these subjects are judged 
as i-igTit or wrong. 

The student who devises a scale for difficulty, then, niTffit 
either present evidence to show that scores for the par- 
ticular ability he seeks to measure are not affected by dif- 
ferences in time, or he must devise methods by which the 
amount of time the pupil is allowed to spend on each task 
within the series may be controlled and recorded. The 
development tests discussed above have many charaeter- 
ieties of the difficulty tests here under consideration, but 
are not like them in every particular. 

3. Time tests, or tests for the amount done, are, in reality, 



SCHOOL ACHIEVEMENT TESTS 189 

the rate tests discussed above. Some additional character- 
iBtics, however, will be given under this heading. It is 
to be noted that time and amount are complementary terms, 
^each of which depends upon the other for its meaning. 
Time implies amount, and amount implies time. The ques- 
tion. How much can be done? demanda a statpjnent of the 
time allowed for doing it, and the question, How long will 
it take? depends upon how much there is to do. This 
variable may be bandied in two ways: (1) The problem 
may be to determine how much can be done in a given 
time as in the Courtis Arithmetic Tesfc^, Series B. In this 
test the time is eight minutes for addition, for instance, 
and the infoi-mation sought is, How many problems can 
'be solved in that time? (2) A definite amount of work 
.may be given and the problem is to determine how long 
"it takes to do it. The latter method would not work well 
where the test is given to a large group of children at the 
.same time, since their finishing times would be different 
.-and therefore hard to record. Quality, difficulty, and time 
or amount tests may be illustrated by the athletic contests 
we carry on in our schools. Marksmanship is a measure 
of quality answering the question, How well can one shoot t 
The other variables of difficulty and time are kept constant. 
The high jump is a measure of difficulty; quality, which 
is good enough to clear the bar, and time are kept constant. 
The 100-yard dash is a time test. 

The three questions, how well, how hard, and how fast, 
represent the teacher's attempts to measure the three 
fundamental factors of quality, difficulty, and time or 
amount. The educational tests and scales that have been 
devised during the past ten years are attempts to help her 
answer these questions and each of them seeks to measure 
some one of those same three fundamental factors. 

Measurements by Opinion, Comparison, and Standard^ 
Ized Tests. — The methods for the measurement of school 



J 



190 EDUCATIONAL MEASUREMENT 

achievements may again be classified according to other 
principles as follows: One method, and the one generally 
in vogue, is that of personal opinion, Measarements by 
this method are valuable just to the extent to which the 
persons passing judgment are qualified to give expert 
opinion. As was noted in the introductory chapter, per- 
sonal opinion, even though expert, is worthless in the face 
of facts unless the opinion happens to agree with the 
facts. 

A second method of measuring school achievements is 
by comparison. We may say that one pupil, or class, 
or school, ranks fifteenth as compared with 25 other 
pupils, or classes, or schools. This method has merit, 
but the chief criticism of it is that it does not measure 
in reference to a standard. Even though a class, or 
school, ranked first among 25, it might still be a poor 
class or school. The ranking or comparison method is 
used to a greater extent than most teachers are aware of, 
however, when monthly report cards are made up. They 
usually reason that A is a better student than B, therefore 
B is given a grade of 80 per cent while A is given 85 
per cent. 

Measurement by comparison is based on the funda- 
mental idea that the common practice is the result of the 
judgment of many men who have attempted to solve the 
same, or very similar problems. The order of merit 
method is used where units are not definite. 

A third method of measurement and the one about 
which we are primarily concerned here is measurements 
by scientifically determined standards and units. This 
is the greatest contribution which has been made to educa- 
tion in the last twenty years. 

Classification by Educational Tests. — Thus far we have 
discussed the general classification of tests. We shall now 
note still a different group of educational measures and 



SCHOOL ACHIEVEMENT TESTS 191 

how they may be used in the reclassification of pupils 
in the schools, and also how they may be used to measure 
educational processes and products. It is obvious that 
if we continue to teach pupils in classes we must so 
classify them that those of approximately the same men- 
tal ability will be together. Experiments made seem to 
indicate that approximately 25 per cent of the pupils in 
any grade belonged mentally to a lower grade and about 
per cent belonged to a higher grade. If the teacher 
does justice to the upper 25 per cent, he is ovep-teaehing 
the other 75 per cent, or if he addresses the average 
pupils, he is under-teaching the upper 25 per cent and 
over-teaching the lower 25 per cent. Fortunately, we are 
developing methods for the reclassification of pupils 
■which will make the pupils in the various classes 
e nearly homogeneous. Bath intelligence and school 
achievement tests are employed in this reclassification. 
Franzen, McCall, and others are making use of the edu- 
cational quotient (E. Q., educational age divided by 
chronological age) as a means of reclassifying students 
and measuring their school progress. 

In order to compute a child's educational age in any 
subject, it is necessary to give a series of tests to a large 
number of pupils and determine the norms for children 
of different ages in that subject. For example, suppose 
an additional test were given to a large number of chil- 
dren ranging in age from eight to fourteen years. Sup- 
pose further that the average number of problems solved 
by children chronologically eight years old was four; 
those eight years, six months old, six; those nine years 
old, eight; those nine years, six months old, ten, and so 
on. Then a child doing six problems would be considered 
eight years, six months old educationally in that test, 
irrespective of his chronological age. If a great number 
of teata were given to a child in the subject of arithmetic 



192 EDUCATIONAL MEASUREMENT 

and & composite score were computed and eompftred with 

the norms for children of the various ages, a child's educa- 
tional age in arithmetic might thus be determined, 
similar way a child's educational age may be computed 
in other subjects. 

Educational Age and Mental Age Compared. — It is 
clear that educational age would be a better measure for 
the classification of pupils than mental age because the 
former represents the actual accomplishment of the pupil 
— the pupil's educational status — while the latter simply 
shows potential ability. The factors determining educa- 
tional age are both hereditary and environmental. At 
the present time we have fairly well-established grade 
norms in the various subjects but do not have educational 
age norms. Suppose educational age norms were es- 
tablished, how could we utilize them in the classification 
of pupils? In what ways are they superior to mental ages? 
If age norms are used, it will be possible to reclassify 
pupils and put each one on his proper educational level. 
When the educational age is used as a basis of promotion 
or demotion, it takes cogni2anee, not only of the pupil's 
mental age which is potential, but also what he has ac- 
tually done, which is one of the best ways of prophesying 
of what he is able to do. 

If educational age is to be used in the reclassification 
of pupils, how much better than the average of his class 
must a pupil be before he is allowed to skip a grade and 
enter a higher class? Or, how much poorer than the 
average of his class must he be before it is considered 
wise to fail to promote him or even to demote him? Here 
both the B- Q. and I. Q. are taken into consideration. As 
was indicated above, the best way to prophesy what rate 
of progress a child can make is to find what rate he has 
made. There is a rather high correlation between the 
educational age of a pupil and his mental age. Only 



B SCHOOL ACHIEVEMENT TESTS 193 

H tentative answers can be given to these questions since 

H not euough work along these lines has been done to be 

* entirely sure as to what is the best thing to do. It is the 

opinion of some investigators that the reclassification of 

pupils in a relatively small school should be somewhat 

I as follows : No pupil should be allowed to skip a grade 
unless his educational age exceeds the educational age 
of the lowest 25 per cent in the grade to which he pro- 
poses to enter. In the matter of demotion and failure to 
promote, no child shall be denied his normal promotion 
nor be demoted unless his educational age falls within 
the lowest 25 per cent of his class. These rules not only 
take cognizance of what the child has done but also his 
potential power. They do not mean, of course, that simply 

I because a child belongs to the lowest 25 per cent of his 
class, he shall be demoted, or because he exceeds the 
average of his class, he shall skip a grade, but they mean, 
if a pupil were an average pupU in class, that he would 
not be denied promotion nor be demoted- 
JuBt as a mechanic working in a garage has a rather 
definite procedure in diagnosing engine trouble, so an 
educational expert soon develops a rather definite pro- 
cedure in diagnosing educational ills. When a child fails 
in promotion, one of the first things is to get his mental 
age. There is usually a substantial correlation between 
mental age and school marks, and a child's mental age 
may throw much light on his failure to be promoted. 

Accomplishment and Educational Quotients. — The ac- 
complishment quotient (A. Q.) of a pupil is found by 
dividing his educational age by his mental age. It shows 
the degree a pupil's actual progress approaches his poten- 
tial progress. This is perhaps the best measure we have 
of the instruction and the application of the pupil. When 
a report of this kind is taken borne, the parent may form 
some intelligent idea as to whether the pupU is working 



194 EDUCATIONAL MEASUREMENT 

ap to his optimum capacity. He may find ont whether 
the pupil is progressing at a rate normal for his mental 
and educational ages. This also becomes a protection to 
teachers. If they can show that all the pupils under their 
guidance have progressed at a rate normal to their general 
intelligence, they have a good defense for adverse criticism 
of their teaching. 

Another unit of measure that is being employed in 
measuring educational processes and products is the educa- 
tional quotient (B. Q.), which is the educational age 
divided by the chronological age. It is the division of what 
is, by what it would be if the pupil were normal. It gives 
the percentage of normality. The quotient thus derived 
will indicate whether the pupil has made normal progress 
for his age or whether he has progressed more rapidly or 
slowly than the average. Indirectly, at least, the educa- 
tional quotient throws some light on the pupil's general 
intelligence. A high educational quotient usually signifies 
a high intelligence quotient, but not always. Well taught 
pupils with a low I. Q. may have relatively high educa- 
tional quotients. 

When pupils are classified according to their native 
capacities and educational ages, we may then begin to judge 
the quality of the teaching more intelligently. A reason- 
able increment or interest on the mental capital invested 
will be all that is expected of teachers. Educational norms 
will serve as goals for teachers and pupils. Parents and 
supervisors wiU not expect increments out of proportion 
to the mental capital invested. On the other hand, teachers 
will linow when pupils are doing an honest day's work, 
that is, when they are accomplishing a normal amount for 
their general intelligence and educational level- 
Principles for the Choice of Subject-Matter for Educa- 
tional Testa and Scales. — What shall be the controlling 
factors which will determine the type of subject-matter 



SCHOOL ACHIEVEMENT TESTS 195 

used in tests? Many factors enter into this problem, some 
of the more important of which we shall discnas here. As 
was discussed in the previous chapter, a scale is a ladder, 
or a linear rule, extending from the worst to the best, 
from the lowest to the highest, or from the easiest to the 
hardest, and indicates the steps or degrees by which inter- 
mediate achievements may be ganged. Now it is evident 
that one making a scale of progressive degrees of difBenlty 
may seek subject-matter with only this idea in mind and 
pay no attention to the practical or utilitarian value of 
the subject-matter that enters into the scale. This method 
is sometimes called the statistical method as opposed to the 
analytical method, which does take cognizance of the sub- 
ject-matter field from which the material is to be chosen. 
In Woody 's arithmetic scales, for instance, he states that 
his "fundamental idea was to derive a series of scales that 
would indicate the type of problems (examples) and the 
difficulty of the problems (examples) that a class can solve 
correctly."" This method assumes that an example is 
suitable for use in a test simply because it is done cor- 
rectly by a gradually increasing per cent of pupils as 
one proceeds from grade to grade. He selected his test 
material not on the basis of its arithmetical significance 
but on the basis of the consistency of the pupils' reactions. 
It seems to the author that his procedure is open to 
criticism on the ground that arithmetic is a tool subject 
and there is little use in testing the ability of pupils to 
use a tool that they will rarely or never be called upon 
to use in life outside the school. 

It seems that in the tool subjects, at least, one must 
determine the subject-matter for tests and scales very 
largely on the basis of its utilitarian value. 



' Measurements of Some AekieKemenls in Arithmetic, Teachers CoHcRe 
Contributions to Education, No. 80 (1916), p. 1. 




196 EDUCATIONAL MEASUREMENT 

Infonaation Desired Detemunea the Type of Sabject- 
Matter. — Tests are of three general types, difficulty tests, 
speed teats, and quality tests. The kind of subject-matter 
used in the test will be determined by the kind of in- 
formatioQ desired. If a teacher wants to know how 
rapidly a pupil eaa do arithmetic, for instance, the prob- 
lem is one of choosing subject-matter of a constant level 
of difficulty and of luuform quality, which is that quality 
which the public has been accustomed to call right. The 
subject-matter should be chosen from fields that supply 
the tools for doing society's work in arithmetic. The rate 
test will differ from the difficulty test since in the former 
but one type of subject-matter is usually chosen, while 
the difficulty test usually includes several types. 

Some Characteristics op an Ideal Educational Scale 

An ideal educational scale must have at least the follow- 
ing characteristics: (1) it must have an accurately defined 
zero point; (2) the steps ahove the zero point must be of 
equal magnitude; (3) the scale must measure the desired 
educational product; (4) it mtist be so simple in its ap- 
plication that it is adapted to the classroom j (5) it mast 
not require an undue amount of time in administration. 
We shall note briefly the significance of each of these 
characteristics. 

1. The Establishment of a Zero Point, — In education 
as in the physical sciences two positions may be taken 
as the zero point on the scale. The first may be called the 
absolnte zero, which means just not any of the thing in 
question. In measuring heat, for instance, absolute zero 
i.i 273 degrees below the point we ordinarily call zero on 
the centigrade scale and means that all the heat has passed 
out of the thing in question. Establishing an absolute 
zero point in education is a difficult thing to do. The zero 



SCHOOL ACHIEVEMENT TESTS 197 

point on the Hillegas Composition Scale was determined 
by 40 professors of English, editorial writers, psychologists, 
and educational experts. The composition with zero merit 
reads as follows: 

Dear Sir: I write to say it aint a square deal. Schools is I 
Bay they is 1 went to a school. Red gree green and brown aint it 
hit to a bit I say he don't know his business Bot today not 
yesterday and you know it and I want Jennie to get me out. 

That is probably a little better than zero. There is a little 
merit concealed in it but not enough probably to prejudice 
the matter seriously. The fact that zero points do not 
stare us in the face in the ease of mathematical originality, 
or knowledge of German, or ability in writing as they do 
in the case of measures of length, and weight, and time, 
is no excuse for not trying to get them. If we get scale 
points defined and their distances defined and established 
in reference to an absolute zero, there is no further diffi- 
culty in constructing a scale to measure mental achieve- 
ments. Such scales have every logical qualification that 
any of the scales in the physical science have. 

Zero points on scales are imperfectly known, and as a 
result we add and subtract educational quantities with 
much less precision than desirable. We cannot say that one 
product is twice as good as another, or one task twice as 
hard as another, or that one improvement is twice as great 
as another unless we establish a zero point. Statements 
of these kinds are intricate and subtle matters involving 
presuppositions which must be kept in mind. 

I The ordinary scale for weight exemplifies an ideal scale 
in four respects. First, it is a series of perfectly definable 
facts. All men the world over know exactly what is meant 
by two grams, four grams, etc. In the second place, each 
amount is a definite amount of the same kind of thing. In 
the third place, the difference between any two of the 
*- ■ 



198 EDUCATIONAL MEASUREMENT 

amoants is perfectly defined in terms of some unit of dif- 
ference. The step from four to five grams is the same as 
from six to seven. Lastly the zero point of the scale is 
absolute. That is, it means just barely not any of the thing 
in question. 

Thorndike acquired an actual zero produced by a human 
being in penmanship. He has a signature of a letter wliich 
cannot be read in toto and in which no letter can be read 
by any one of hundreds who have tried. He defined zero 
as the suppositional handwriting such that, though recog- 
nizable as handwriting, it has no legibility, no beauty, no 
value as penmanship. When we used zero in education in 
the past, we usually have had in mind a relative zero. The 
value of this zero is usually a subjective one, hence, ill- 
adapted to measure educational products. 

Education is having the same difficulty that the natural 
sciences pass through in the standardization of their units. 
In the matter of recording temperature, for instance, scien- 
tific progress was handicapped by the fact that different 
individuals were using different reference points when 
measuring temperature. Finally, after long and costly 
delays, the methods of measuring temperature was reduced 
to two competing systems, one of wliieh took the freezing 
point of water as the point of reference and the other took 
a point 32 degrees below that point. In measuring the 
height of land forms scientists agreed to take sea level as 
the point of reference. Other things might have been taken 
but they would not have heen so universally applicable, 
and, if a number of different things had been taken, much 
time and labor would have been needed to convert one unit 
into another in order that comparisons might be made. 

In education the tendency has been to search for some 
absolute zero point for the trait being measured. This is 
obviously a difficult task and results in much confusion, 
even assuming that it can be scientifically determined, be- 



SCHOOL ACHIEVEMENT TESTS 199 

cause each particular test will have its own zero point. 
Man; methods are used in the location of these zero points 
and points of reference. McCall" mentions six methods 
of locating the zero points in scales now extant: (1) the 
reference point on unsealed tests is just no score on the 
material of the particular test ; (2) the zero point is guessed 
at by the author of the scale; (3) the reference point on 
judgment scales is the median judgment of judges as to 
the location of zero merit in composition, handwriting, and 
the like, as in the cases cited above; (4) the zero point is 
located by the nse of the per cent of pupils in some early 
grade who make no score on very easy material; (5) the 
reference point for other scales is three times the standard 
deviation' below the mean of the group for whom the test 
was devised; (6) the reference point is simply the lowest 
score made. There are still other methods for locating 
points of reference in scale building. 

Since there is a lack of agreement as to just what the, 
zero pomt in reading and other subjects should be, McCall ^. 
proposes to take as the reference point for school achieve- \ 
ment tests, not a zero point, but the mean performance of 
children between the ages 12 and 13 years. He thinks that 
such a point could be used for any mental trait regardless 
of the location of its absolute zero, if such there be.^ He 
would then measure the school achievements of children in 
other grades in terms of the 12-year-old children. The 
grades ranged from 5 S. T>. (standard deviation) below 
the mean score of the 12-year-olds, the mathematical zero 
of the scale, to 5 S. D. above the mean score of the 12-year- 
Each S, D. was divided into 10 units, making the . 




' "Propoaed Uniform Method of Scale ConetnictioB 
Collate Record. Vol. 22, No. 1, Jan., 1921, pp. 31-51. 
' See Chapter X for a definition of standard deviation. 
'Op. eii., p. 43. 



J 



200 EDUCATIONAL MEASUREMENT 

entire range from to 100 witli 50 as the mean score of 
the 12-year-old children. 

By this method any pupil who makes a score of 50 has 
an ability equal to the mean ability of the 12-year-old 
children, and a pupil who makes a score of 40 has an 
ability of 10 units, or one S. D. below the mean ability of 
12-year-olds. A pupil with a. score of 75 is 2.5 S. D. above 
the mean ability of 12-year-olds and so on, 

McCaE gives four reasons why the mathematical zero is 
located 5 S. D. below the mean instead of at the mean: 
(1) this procedure eliminates cumbersome plus and minus 
signs; (2) it forms a convenient range of points between 
1 and 100 with the reference point at the easily remem- 
bered 50; (3) this procedure carries the scale down and 
up as far as any one wiU need to go; (4) it gives a mathe- 
matical zero which is close to the supposed absolute zeros 
for reading, spelling, writing, composition, completion, and 
other typical mental functions. 

One objection made to this scale is that the "times state- 
ment" cannot be employed in dealing with mental traits 
because the absolute zero point is not found. That is, one 
cannot say that John has three times the ability of James 
in a certain subject unless the absolute zero point of ability 
is determined. Nevertheless, 5 S. D. below tlie reference 
point cited above gives a mathematical zero that cor- 
responds reasonably well with the absolute zeros deter- 
mined in most scale making. And, when all is considered, 
the best way to appreciate ability of an individual is to 
refer him to the mean ability of his own or some standard 
group. 

Another objection urged i 
is that any score above or 1 
indicate whether the pupil j 
particular trait, because a 1 
little of a certain trait and : 




SCHOOL ACHIEVEMENT TESTS 201 



rThis defect ii to be remedied by the use o£ the absolnte 
zero of the trait, provided such can be found. 

The time of birth is used in most scales for measuring 
general intelligence as the point of reference. That is, in 
attempting to measure the general intelligence of an in- 
dividual, his score is measured in terms of years and months 
of mental age. 

McCall would not only standardize the points of refer- 
ence in mental measurements hut he would also standardize 
the units, and in each case would use some function of 
the variability of 12-year-old children, preferably the 
standard deviation. Thomdike and his students have con- 
stantly used some function of variability as the unit of 
measure. 

The Binet-Simon Scale revised by Terman has met the 
qualifications for scientific scale building in that it has a 
definite tangible point of reference, the time of birth; it 
is simple and objective and is easily understood. But it 
fails to meet one condition for an ideal scale in that the 
units of the scale are not equal. The development of 
general intelligence is measured in years and months. The 
interval between eight and nine years, for instance, is 
larger than the interval between 14 and 15. In certain 
traits the unit above the age of 16 becomes zero. Because 
of the effects of social conditions, it becomes difficult to 
build up a scale on the age basis which' will measure satis- 
factorily the general intelligence of pupils below the age 
of eight and above the age of 12. Hence, a scale of this 
kind cannot satisfactorily score pupils with exceptionally 
low or with exceptionally great ability. Judgment scales 
discussed above may be converted into scales of this kind 
thereby making all scales performance scales. * 

^M * For a more elaborate discussion ot the T scale the reader iarefened 
^1 to the "Proposed Uniform Method of Scale Cosstructioa," by McCoU, 
^^^eaekera College Record, Vol. 22, Jan., 1921, 



^ 



202 EDUCATIONAL MEASUREMENT 

2. Making tbe Steps of Eqtua Magnitude.— Three 

methods are in common use in making the steps of a scale 
of equal magnitude. 

(a) The Method by Competent Judges. — Many things in 
education most of necessity be left to the consensus of 
opinion of competent judges. The relative merits of two 
drawings, for instance, most always be determined in this 
way. Or, if we do not use the functional method in hand- 
writing, the relative merits of the various samples to be 
measured may be determined by competent judges. By 
this method we may say that differences in merit between 
samples of handwriting, for instance, are equal when they 
are noticed by an equal number of competent judges. For 
example, if we had 1,000 of the best judges in handwriting 
in the world, and 750 of them were to judge a certain 
sample of handwriting designated by the figure 10 as better 
than another sample designated as 9, and the same number 
would say that sample number 11 was better than sample 
number 10, then we might say that sample 10 is as much 
better than 9 as 11 is better than 10 because the differences 
were noted by an equal niunber of competent judges. 

This method rests on the fundamental assertion that 
equally-often-noted differences are equal. Owing to the 
nature of education, it is probable that a large part of the 
pedagogical scales of the future will be based on the con- 
sensus of judgments by competent judges. In drawing, 
English composition, music, handwriting, and such sub- 
jects, it is practically impossible to measure the results save 
by means of scales thus designed. 

It should be noted in passing that the whole theory of 
scale development may be classified under two general 
methods: the judgment method, as the Tbomdike and 
Hillagas scales, and the ratio method, followed by Ayres 
in making a spelling scale, and by Trabue in making a 
language scale. It may readily be shown that the school 



I 



SCHOOL ACHIEVEMENT TESTS 203 

eabjects are susceptible of arrangement in a certain serial 
order which will indicate the method of scale derivation 
applicable to them. At one end of the series will be such 
subjects as spelling and arithmetic, which lend themselvea 
to the ratio method, that is, to the expression of the rela- 
tion between the actual number of correct responses and 
the possible number of correct responses. At the other 
end of the series are such subjects as composition and pen- 
manship. Scales for these subjects must be derived by the 
judgment method. One sample of handwriting is better 
than another not as a fact but as an opinion. A specimen 
of English writing is better than another precisely because 
competent judges think it is better. Between the two ex- 
tremes of school subjects are a number of subjects such 
as history, geography, and literature, which may be mea»- 
ared by either method. 

(6) By the Functional Method. — By this method the 
quality of the thing is not measured directly, but indirectly 
by measuring the degree to which the particular thing 
or product functions. The Gettysburg Handwriting Scale 
is an example of this kind. By this method two samples 
of handwriting were said to be equally legible if readers 
could read one sample as rapidly as the other. The 
method of making the steps equal is as follows: Suppose 
three samples of handwriting are being considered, of 
which expert readers can read sample A at the rate of 
100 words per minute, sample B at the rate of 120 words 
per minute, and sample C at the rate of 140 words per 
minute ; then we may say that sample B is as much better 
than sample A as sample C is better than sample B, and 
that therefore the steps arc equal. 

Measures in the physical sciences are quite often made 
not by measuring the thing directly but by measuring some- 
thing that varies with it. For instance, we do not measure 
heat directly but measure the length of a mercury column 



[1 



204 EDUCATIONAL MEASUREMENT 

which we know varies directly with the amount of heat 
in the body. By this method we know that the amonnt 
of heat that it takes to raise a column of mercury from 
a point marked 10 on the scale, for instance, to one marked 
11, is the same, approximately at least, as the amount of 
heat that it takes to raise the mercury column from 11 to 
12 or from 15 to 16. By the same principle, Ayres es- 
tablished approximately equal steps on his writing scale 
by assuming that progressive degrees of merit varied di- 
rectly with the "readability," or the speed with which the 
various samples might be read. To the physicist those 
differences are equal which are produced by the same cause 
under the same circumstances, or under which the same 
conditions produce the same efEeet. 

For a long time we measured fatigue by measuring the 
distance between the points of the testhesiometer when 
placed on the skin and, although the correspondence be- 
tween cutaneous insensitivity and fatigue has been more 
or less discredited, it is not discredited on the ground that 
the fatigue element could not be measured in this way, 
provided there is a correspondence. 

Perhaps it should be noted in passing that none of these 
scales approaches perfection. They are still crude, but 
much better than the old methods ba^ed on personal 
opinion. 

The Ayres Gettysburg Handwriting Scale has been 
criticized especially on the ground that in making the scale 
the different qualities of handwriting were determined by 
the rapidity with which the samples might be read, but 
when the scale is used in the school room the merit of a 
particular sample is determined not by how rapidly it may 
be read but by the nearness to which it approaches in form 
and general appearance a sample the readability of which 
is known. That is, the scale was made from a functional 
standpoint, the speed with which the various samples might 



SCHOOL ACHIEVEMENT TESTS 205 





M 


s 






^ 


t " 






K 


s s 






fe 


3 S B 






u 


8SS 






b 


j g s s 






^ 


g g g s 






to 


|SKS S 






as 


g SES S 






Q> 


SSS8S 






a. 


JSBSSSS 






o 


S B S S S S 






&; 


SS S S 8 8 




1 


* 


1 SS SS gg 




^ 


-^ 


s B a sa s 






W 


8 Sg ss g 






s 


SS3 SS S 






^ 


es g sg 






% 


sa s s 






G! 


sags 






^ 


S S 1 






t^ 


sal 






Q 


ss 






^ 


$§ 






«S 


sS 






-^ 


s 





206 EDUCATIONAL MEASUREMENT 

be read ; but when a sample is to be measured, its quality 
is determined not by the speed by which it may be read 
but by the nearness it approaches in form and general 
appearance to a sample on tbe scale. 

(c) The Proportion-of -Pupils-Solving Method. — la a 
spelling test, for instance, the words may be ranked accord- 
ing to spelling difficulty by determining the number of 
children that fail to spell them. For example, if three of 
the words are home, church, and separate, and 95 per cent 
of the children are able to spell the word, home, 90 per 
cent are able to spell the word, church, and 85 per cent 
are able to spell the word, separate, we may be assured 
that the words are arranged according to their spelling 
difficulty from the easiest to the most difficult. It does 
not follow, however, that since 5 per cent fewer pupils 
were able to spell the word, church, than were able to 
spell the word, home, and since 5 per cent fewer were able 
to spell the word, separate, than the word, church, that the 
increase in spelling difficulty from home to church equals 
the increase from church to separate. 

In order to make the steps of equal difficulty, the normal 
distribution curve is brought into use. We may illustrate 
how the steps are made equal, or approximately bo, by 
reference to the headings of the Ayres Spelling Scale, 
Figure IT. Dr. Ayres divided the words in his spelling 
scale into 26 divisions or columns. The words in each 
column are of approximately equal spelling difBcalty, and 
the steps in spelling difficulty from each column to the 
next are approximately equal. The figures at the top of 
the scale indicate the approximate average scores of correct 
spellings that may be expected among children of the dif- 
ferent grades and of the same grade. Thu-s, in column K, 
for instance, the 58 at the head of the column is the average 
score that should be expected from second-grade pupils 
attempting to spell the words in this column. The average 



SCHOOL ACHIEVEMENT TESTS 



2Q7 



score for third-grade pupils is 79, for fourth-grade pupils, 
92, and so on. The numbers 99, 98, 96, 94, 92, etc«, at the 
top in Figure IV for the second grade are as near the 
mid-points of the equal steps as can be obtained without 
using fractions. That is, the step from column A to column 
B equals the step from column B to column C, and so on. 

In making the spelling scale he assumed that the spelling 
ability in any one grade is distributed according to the 
normal probability surface. 




D 


B 


N 


A 


c 


ct 


20 


40 


90 


eo 


80 


100 



Figure V. iLLusTRATiNa thb distribxttion op sfblling abilitibs 

(adapted from Ayres) 

That is, taking any grade, as the third, for instance, and 
representing it by a normal distribution curve as in Figure 
V, we note that at the extreme left, the curve is very near 
the base line which indicates there are very few excep- 
tionally poor spellers. In the middle, the curve is the 
greatest distance from the base line thus representing a 
large proportion of medium spellers. The median line, in 
Figure V, represents the 50 per cent in the third grade, 
Figure IV. The horizontal line, xy, from the median to 
the curve represents 1 sigma distance (sigma, <r , is the unit 



208 EDUCATIONAL MEASUREMENT 



for standard deTiation) and intersects the curve at a poiat 
at wtich it clianges from convex to concave. 

This distance is always a eonslant function of the curve 
of normal distribution and in the Ayres study was chosen 
as the unit of measure along the base line. He laid off 
on the base line to the left of the line, MN, a distance 
equal to 2.5 sigma. The part from ^ to B is 0.5 sigma, 
from B to D and from D to P is 1 sigma each. Since the 
curve does not meet the ha.se line at 2.5 sigma from }f, but 
theoretically meets it only in infinity, we may assume that 
since only 0.62 per cent of the area between the curve and 
the base line lies to the left of point P, that for practical 
purposes it meets the base line at 2,5 sigma from N and 
this point may be considered as zero. 

Ayres thus divided the base line into five equal parts. 
He called the estremes from left to right, and 100 respec- 
tively. Assuming, as he did, that the entire frequency 
might be included between paints 2.5 sigma to the left of 
N and 2.5 sigma to the right of iV, he found that 7 per 
cent of the entire area between the curve and the base line 
lies to the left of the line erected at point D. Between this 
line and the perpendiculars to the base line at point B are 
24 per cent of the cases. Between the lines erected at 
points B and A are 38 per cent of the cases ; between A 
and C are 24 per cent of the cases; and between C and Q 
are 7 per cent of the cases. In sealing the words he found 
that 7 third-grade children out of 100 failed to spell the 
word "has," while 93 per cent spelled it correctly. 

Applying this fact to the curve, the word "has" would 
be located at point 20 on the base line which would have 
7 per cent of the cases to the left of it and 93 per cent 
at the right. In a similar way, a word missed by 31 per 
cent of the children would be located at point B on the 
base line. By the same method all the words were thus 
located in the spelling scale. 



1 



r 



SCHOOL ACHIEVEMENT TESTS 209 

By dividing each oi the five divisions on the base line 
into five equal parts, Dr. Ayres made a total of 25 steps 
■anging from 0, or near 0, to 100 in his spelling scale. The 
average of all the values that might theoretically be con- 
tained in each of these 25 steps has thus been determined 
to the nearest whole number and this value has been as- 
signed to the step. These 25 values are 100, 99, 98, 96, 94, 
92, 88, 84. 79, 73, 66, 58, 50, 42, 34, 27, 21, 16, 12, 8, 6, 4, 
2, 1, 0. The limits of these values are as follows : 






50 



from 46 to 54 



100 



100 



100 



Noting the scores 100, 99, 98, 96, 94, etc., in the top row 
and at the left in Figure IV, we see that the steps do not 
appear to be equal since they are 1, 1, 2, 2, 2, 4, etc. In 
order to make the steps of spelling difficulty equal from 
one column to another the average scores are computed in 
per cent and transmuted into units of standard deviation. 
This may be done by referring to tables for converting 
the per cent faihng into units of standard deviation. 
We shall note more specifically the curve in Figure V to 
illustrate two measures of deviation that are used in mak- 
^L ing the steps equal. Theoretically, the curve reaches the 
H base line only in infinity. The area between the curve and 
^^ the base line is divided into 10,000 equal parts and each 



i 



210 EDUCATIONAL MEASUREMENT 

particolar section carefully mapped out so that it is known 
exactly where lines perpendicular to the base line must 
be erected on each side of the line MN to include the middle 
half of t!ie area of the figare, or any other fraction of it. 
Whoi lines are erected at equal distances from the line MN 
so that the area between the curve, the base line, and these 
two perpendicular lines is equal to one-half the total area, 
the distance of these perpendicular lines from the line MN is 
known as the probable error. In other words, the probable 
error is a distance on either side of the measure of central 
tendency (MN) that will include the middle half of the 
measures. It is evident that the line PQ may be divided 
into any desired units of probable error (or P. E., as the 
units are usually designated). It is also evident that a 
unit's distance on the line PQ near the central part of 
the figure would include a much larger area than the same 
linear unit would out near the ends of the curve. The area 
included between two perpendicular lines erected at units 
distance apart at some column as N, Figure IV, for in- 
stance, may be eight or ten times as great as the area in- 
cluded between two lines similarUy drawn at or near 
columns T or Z. 

Lines erected at equal distances from the line MN so as 
to include the middle 68.26 per cent of the area between 
the base line and the curve arc said to be erected at a dis- 
tance of one sigma (a) from the line MN. Sigma (ir) 
represents the standard deviation. It is evident, therefore, 
that the line PQ may be divided into any number of equal 
parts, each part being represented by sigma or a fraction 
thereof. As in the case of P. E., two lines erected per- 
pendicular to the base line near the center of the curve 
and at sigma 's distance apart would include a far greater 
per cent of the area than if erected at sigma 's distance 
apart but near the end of the curve. 

Now if one transmutes the per cents given at the head 



SCHOOL ACHIEVEMENT TESTS 211 

of each column in the Ayres Spelling Scale for any par- 
ticalar grade, as for instance the third, into units of 
standard deviation, or sigma, it will be found that in 
passing from left to right the words grow progressively- 
more difficult by approximately equal steps, and that the 
size of the stepK ia 0.2 sigma. Standard deviation is dis- 
cussed more fully in Chapter X, 

3. The Scale Must Measure the Desired Educational 
Product. — It is not always easy to tell just what a test 
measures after it is given. For instance, it is very doubt- 
ful just what the Trabue language tests measure. They may 
measure general intelligence or language ability or both. 
The scores are very difficult to interpret. Suppose a pupil 
makes a poor score on these tests. What shall the teacher 
or the administrator do as a result of it? Because of 
their general nature, little guidance comes to the teacher 
from tests of this kind. On the other hand, if a pupil 
makes a low score on the Charters pronoun tests, the 
teacher knows immediately where more drill work should 
he done. It gives her a point of departure. She knows 
that she has tested the pupil in a specific field and knows 
his weak points. When scores are a result of many 
variables it is difficult to tell just what the test has really 
measured. Care should be taken, therefore, to see that 
the test really measui'cs what it is designed to measure. 

4. The Test Must Be so Simple in Its Application that 
It Is Adapted to the Classroom. — The only way that it is 
possible to determine the intellectual achievement of a pupil 
is by what he does. In school achievement tests, pupils 
are asked to do a great many things, and it is evident 
that if the directions for giving the tests cannot be well 
understood by the pupil, the product will not be a correct 
measure of the pupil's ability. Any one who has given 
tests knows that even when the instructions are so simple 
that it would seem almost impossible for a pupil to mis- 



212 EDUCATIONAL MEASUREMENT 

understand tbem, yet, in a class of 40 popils there 
will be one or two who do not know what the test calls 
for. 

The ideal test must be of sueh a nature that the record- 
ing of answers, or the execution of the design, if a draw- 
ing is called for, may be done with the minimum amount 
of time and energy, so that practically all the effort will 
go towards the solution of the questions or problems 
rather than recording the answers when once worked out 
and made. 

6. Tests Must Not Require an Undue Amount of Time 
in Administration. — Tests that require an undue amount 
of time in administration will be neither popular nor 
practical. Both teachers and pupils dislike tests that 
require a long time to do them and that require much 
writing to be done. In tests of this kind much energy 
is wasted in recording the answers after the problems 
have been solved mentally or after the correct answers 
have been determined to questions. Furthermore, the 
labor in scoring is so great that too much of the teacher's 
time must be spent in reading the papers. It isn't always 
easy to design a test that will satisfy these conditions, 
but the popularity of the test will depend very largely 
on these two points. The mechanical phases of test mak- 
ing are as important from the standpoint of administration 
as are the thought phases. Some of the most scientifically 
constructed tests we have, from the standpoint of ac- 
curately measuring the educational processes and prod- 
ucts, are so mechanically clumsy that teachers dislike 
to use them. It may even be desirable that something be 
sacrificed in accuracy in order to increase the practicality 
of the testa. This is safe, however, only in so far as the 
net gain is greater than if most of the emphasis were 
placed on accuracy. The point may be illustrated from 
an example in the business world. From the standpoint 



i 



SCHOOL ACHIEVEMENT TESTS 213 

of accuracy the grocer might obtain a pair of scales that 
would measure sugar to one-one-hundredth of an ounce ; 
but, practically, the net gain for both customer and grocer 
would be less than if he measured much less accurately. 



CHAPTER Vn 

SCORINQ THE TESTS AND TREA^TMENT OF THE MEASUBEB 

In this chapter the problems incident to the scoring 
of tests and the distribution of the measures after the 
tests are scored will be presented. While the teacher 
desires to know what each individual pupil is able to do on 
a test she also wants to know how the pupils stand as a 
class and how the class ranks when measured by es- 
tablished standards. 

Problems of Scoring. — The problems of scoring are 
many and varied. The particular design of the test is 
very largely determined by the way the scores are to be 
obtained. In Chapter V we noted the problems incident 
to test making in a general way. In order to bring out 
more clearly the problems that must be met and solved 
in designing a test and especially to make clear the in- 
fluence that scoring has on the general design of the test, 
the actual problems which confronted the makers of the 
Gregory-Spencer Geography Test' will be presented. 

Example in the Development of a Geography Test. — 
The first problem that confronted the makers of this test 
was the question as to whether the test should be strictly 
diagnostic, covering quite thoroughly some specific field 
of geography, or a general test covering in a general way 
the entire field t The decision of this question was made 
only after a great deal of research work had been done 

' Designed by C. A. Gregory, Professor of School Administration, 
University o£ Oregon, and Peter L, Spencer, Instnictor in the Uni- 
versity High School, University of Oregon. Published by the Bureau 
of Educational Research, tFniveisity of Oregon. 



SCORING THE TESTS 215 



W attempting to find out what the best writers in the field 
W of geography considered the purpose of geography to be. 
It was then necessary to consult the standard texts in 
geography in order to learn what subject-matter they con- 
tained and how the subject-matter was divided, and to 
(make an estimate from the amount of space given to the 
various topics as to their relative values. There is prob- 
ably no subject in the curriculum that lacks focus in pres- 
entation more than the subject of geography. The field 
is so broad and it contains so many phases where emphasis 
might be placed, that after one has exhausted every scien- 
tific means available, he must still rely in part on his 
personal judgment as to what constitutes the proper sub- 
ject-matter for the test. 

I The question as to whether the test should be general 
or diagnostic was arbitrarily decided by making it, to a 
limited degree, diagnostic. Since the kind, amount, and 
order of presentation of geographic material are deter- 
mined to a very large degree by the textbooks used, it 
would be folly to design a test covering subject-matter of 
a difEerent kind. In the design of the test, therefore, the 
subject-matter actually being taught, rather than the sub- 
ject-matter that ought to be tauglit, furnished the material 
, for the test. 

It is not the business of those designing a test to say 
^ whether the things taught are the things that ought to be 
I taught. The purpose of the test is to determine how well, 
Lor to what degree, the children know and can do the things 
Cthey are being taught. There are many lines of cleavage 
Kin the subject-matter of geography, and it was necessary 
■to make divisions of some kind. It is possible to divide 
I'tbe field into physical, political, and commercial geography 
Jwid make the tests with this division in mind. Or we 
Skight think of the subject-matter being divided into causal 
jeography, place geography, mathematical geography, map 



216 EDUCATIONAL MEASUREMENT 

study, etc. Even granting that the test was to be made 
according to any one of these divisions, since the Bobject- 
matter in each division is so great that it could not possibly 
be covered by one, or even two or three tests, it was neces- 
sary to decide what particular bit of sabject-matter should 
supply the material for the test. 

State Examination Questions Examined. — It was thoi^|ht 
that the questions prepared by state departments that are 
to serve as state examinations in geography might give 
some guidance in the preparation of the test. A letter 
was accordingly addressed to each state asking for lists 
of geography questions that the state departments prepare 
for the seventh and eighth grades. Forty-seven states re- 
plied and twenty-three prepared such questions. Some of 
the states sent questions dating back five years. The total 
number of questions thus received was about 1,300. After 
eliminating the local questions such as, "Bound your state 
or county," "Name the counties of your state," etc., the 
questions were compiled according to continents, countries, 
industries, etc. One of the striking characteristics of the 
questions is the fact that they lack focus. They are very 
widely scattered. 

'Another source of information was a representative 
group of courses of study, which were consulted for the 
purpose of finding out what phases of geography were 
there emphasized. 

The Divisions Chosen. — Applying the information de- 
rived from the above sources, supplemented by our per- 
sonal judgments, it was decided to divide the subject-matter 
for the test into the following divisions; (1) Place and 
fact geography, which test the pupil's knowledge as to 
where important cities, rivers, and seas are, and also his 
knowledge as to the distinguishing characteristics of a large 
number of cities. (2) Causal geography, which attempts 
to discover the pupil's power to reason from cause to effect. 



SCORING THE TESTS 217 

This phase of the test was divided into two parts, one 
pertaining exclusively to the United States, and the other 
pertaining to the world as a whole. (3} Cmnmercial 
geography. (4) Political geography. 

The Selection of the Cities to Be Used in the Test. — 
Having decided that place and fact geography should con- 
stitute two of the major parts of the test, the next problem 
was to find out what cities, rivers, seas, etc., should be 
located, and what facts concerning them should be called 
for in the tests. In other words, since all the important 
cities could not be located in a test of this kind, and all 
the facts concerning the cities thus located could not be 
presented in the test, the problem of city selection was one 
that called for some method of choosing the most important 
cities. This was done in the following way. In the summer 
of 1910 Professor Whitbeck conducted a geographical 
seminar in Cornell University and had in his class about 
75 teachers, principals, and superintendents representing 
21 states. This class was divided up into committees and 
a continent was assigned to each committee with the in- 
structions that the committee was to find the most im- 
portant cities in them. They were to find cities that were 
BO important that an American school teacher should teach 
their location rather accurately. They were also to teach 
why they were important and for what they stand in world 
affairs. It was agreed that a city to be included in the 
list must stand for more than one important thing, Lyons, 
for example, though it is the leading silk-making city of 
the world, has nothing else of importance that an American 
school boy needs to know. Hence it was not included. 

These committees decided upon the lists of cities that 
should be taught and passed them over to a committee 
of the faculty on geography to be passed upon. This com- 
mittee consisted of Professor E. H. Whitbeck; Professor 
Balph S. Tarr, Cornell ; Professor Albert T. Brigham, Col- 





n 


218 EDUCATIONAL MEASUREMENT 1 


gate; Professor 


Charles McMurry; Philip Emerson, of 


Lynn, Masa. j and George D. Hubbard of Ohio Stale Un- ] 


iveraity. 




Two-thirds of 


the cities listed by the first eommittee 1 


failed to pass the faculty eommittee. Any city of the 


United States that received two or more votes from the 


faculty committee was retained. No foreign city was re- 


tained unless it 


received at least three of the six votes of 


the faculty committee. The cities selected, together with 


the number of votes each received from the faculty com- 


mittee, are given 


below: J 




United States— 55 Cities 1 


New York, 6 


Washington, 6 ^^^| 


Chicago, 6 


Denver, ^^H 


Philadelphia, 6 


Louisville, ^^^H 


St. Louis, 6 


Minneapolis-St. Paul, 6 ^^H 


Boston, 6 


Kansaii City, 2 ^^^H 


Baltimore, 2 


Indianapolis, ^^^^| 


Cleveland, 3 


Duluth-Superior, 5 ^^H 


Buffalo, 3 


Salt Lake, 3 ^^^H 


Pittsburg, 6 


Puget Sound Cities, 4 ^^^M 


San Francisco, 6 


Scraiiton-Wilkes-Barre, S^^^H 


Cincinnati, 2 


Galveston, 4 '^^^H 


New Orleana, 6 


Lowell, 3 ^^^H 


Milwaukee, 2 


^H 




Foreign Coithtsies ^^^^| 




EuTOpe-16 ^^H 


London, 6 


6 ^^^1 


Edinburgh, 6 


Athens, 6 ^^^1 


Glasgow, 6 


Constantinople, 6 ^^^^| 


Madrid, 4 


St. Petersburg, 6 ^^^H 


Berlin, 6 


Paris, 6 ^^H 


Hamburg, 6 


MarseiUes, 3 ^^H 


Vienna, 4 


4 ^^H 


Kome, 6 


Liverpool-Manchester, 8^^^^^| 



SCORING THE TESTS 



Bombay, 6 
Calcutta, 6 
Canton. 5 
PeSrin-Tien-tsin, 6 



Hong-Kong, 5 
Jeruaalem, 6 

Tokio- Yokohama, 6 
Mecca-Medma, 3 



Montreal, 6 
Quebec, 5 
Rio Janeiro, 



Western Continent Exclusive of United Stales 
Ayres, 6 



Havai 
Mexico, 6 



Africa, Australia, and Islands of Sea 



Cairo, 5 

Cape Town, 5 
Joliannesburg, 3 
Melbourne, 4 



Sidney, 3 
Manila, 6 
Batavia, 3 
Honolutn, i 



The material having been selected and the amount hav- 
ing been determined upon, the next step was the actual 
drafting of the questions and statements that entered into 
the test. If the test is to be entirely objective, the personal 
equation of the teacher or other individual grading it must 
be entirely eliminated. If, therefore, pupils were allowed to 
frame their own answers to the various parts of the test it 
would eall for an evaluation of these answers by the one 
scoring the papers. That is, it would rest on the judgment 
of the teacher whether the answer given by pupil A was of 
the same value as that given by pupil B. For instance, if 
the question, "Why is it dry east of the Rocky Mountains?" 
were asked there would be a variety of answers with vary- 
ing degrees of merit, and since each answer might contain 
an element of truth, the one scoring the papers would he 
called upon to evaluate these answers. In order to 
eliminate this difficulty and also to simplify and lessen 
the labor in scoring the papers, the answers were framed 
by those making the test and the pupil was called upon 



220 EDUCATIONAL MEASUREMENT 

to put a cross after the best answer given. The following 
statements taken from the test, together with the inetrac- 
tions for giving them illustrate the point. 

Below are given Beveral facts about the United States, Three 
causes are suggested for eaeh fact. Read the statements care- 
fully, theu plate a cross (X) before the cause which you think 
beat explains the fact. 

1. The plains directly east of the Bocky Mountains are dry 

because : 

(a) Few trees grow on them. 

(6) The winds lose their moisture before they get to them. 

(c) The land slopes eastward. 

2. A large number of the people of Pennsylvania are engaged in 

the manufacture of iron and steel products, because : 

(o) They have no lumber with which to build. 

(6) Pennsylvania has many great iron mines. 

(c) Pennsylvania has much coal with which to smelt the iron 



3. Seattle is farther north than Chicago, yet it has a milder 

climate because; 

(«) Seattle is protected by the Rocky Mountains. 

(b) Seattle is protected by heavy forests. 

(c) The ocean modifies the winds which blow over it. 

4. Cattle are raised on the Great Western Plains and are fattened 

and prepared for market on the prairie lands farther east, 
because : 
(a) The market is in the east and the prairies produce much 

(6) It is warmer on the prairies and they afford better pro- 
tection. 

(c) The eastern people are richer and can afford to buy 
them. 

5. New York City is called the Gateway to America, because: 

(a) It ia easy to get to the interior through this port. 
(6) It is the largest city in America, 
(o) It is on a navigable river. 



SCORING THE TESTS 221 

Advantages of Tests Thtis Designed. — There are at least 
five advantages of tests designed like those cited ahove: 

1. Time is saved for the pupil. — The ideal t&st in a 
sabject such as geography is one in which the pupil may 
cover the maximum amount of subject-matter in a mini- 
mum amount of time, and in which practically all the 
time may be spent in determining the proper answer or 
disposition of the parts of the teat with a minimum amount 
of time in writing out or recording the answer thus formu- 
lated. In the first part of the test cited above, for instance, 
it is much easier simply to put a cross before the correct 
statement than to express the thought in the proper 
language and take the time to write it down on paper. 

2. Personal equation is eliminated. — Each pupil is given 
a fair and unbiased evaluation of his ability in the subject- 
matter in question. If he has the cross before the correct 
answers his score will be as high as that of any other 
pupil in the class. The personal equation of the teacher 
is entirely eliminated. The pupil may look over hia paper 
and know that the score given him is correct. The bonds 
of friendship between him and his teacher are thus 
strengthened, otherwise, there is, many times, a strained 
relation between pupil and teacher when the pupil thinks 
the teacher has not properly evaluated liis paper. 

3. Much time is saved in scoring. — If pupils were 
allowed to formulate their own answers, each word in the 
test would have to be read by the one scoring the papers, 
thus increasing the amount of labor many times. By the 
new method and with the aid of a key in the hands of 

scoring the papers, the labor is reduced to the mini- 
and much of the drudgery is eliminated. 

4. More ground may be covered by a test thus designed. 
— A pupil may cover four or five times as much ground 
by a test thus designed in the time allotted, and it may 
be covered more thoroughly. He is less fatigued, because 



J_J 



222 EDUCATIONAL MEASUREMENT 

the labor of writing is eliminated, and he does not dislike 

a test of this kind because most of the drudgery has been 
removed and he knows he will get an unbiased evaluation 
of his work when jt is completed. 

5. Pupils must review the whole field in preparing for 
ike examination. — Since pupils know that the entire field 
cannot be covered in an examination as now given, they 
upend considerable time trying to determine what ques- 
tions will probably be asked, and spend their time on them 
instead of reviewing the entire field. The old alibi that in 
reviewing for the examination the pupil studied every part 
of the work but that covered by the examination would 
no longer apply, since the whole field would be covered. 

Determination of the Scores. — It is extremely difficult 
for some teachers to believe that such a test as the one 
suggested above does anything more than give the highest 
score to the luckiest guesser. They say it is a game of 
chance that cannot possibly give a correct estimate of the 
pupil's school achievements. In spite of this prejudice, 
chance is fatally exact, and it is on this principle that the 
scoring may be done with an assurance that the real knowl- 
edge of the pupil may be determined in a test of this kind 
or, at least, as accurately as by the old method. 

If one were to take a hundred pennies and toss them 
into the air one hundred times and count the number that 
fell heads up and the number that fell tails up, he would 
find that in the 10,000 tossed almost exactly half of them 
would fall heads up. 

If, instead of there being three statements to choose 
from in giving the correct reasons in the above test, there 
had been only two, the chance would be analogous to toss- 
ing pennies. A student who knew absolutely nothing about 
the statements would get them right approximately 50 per 
cent of the time by mere chance. Then the question arises 
as to how the papers should be scored to take proper 



SCORING THE TESTS 223 

cognizance of this game of chance. To do this we shall 
take a number of hypothetical cases to show that it is 
possible to make the proper evaluation in each case. 

1, When there is a choice between two answers. — Let us 
suppose there are 20 parts to the test and the pupil actually 
knew 10 of them. Then he would mark his paper as 
follows: He would put a cross before the 10 he actually 
knew and guess at the other 10. By the laws of chance 
he woidd get 50 per cent of them right. Thereforej his 
paper would have 15 of the 20 parts of the test marked 
correctly. In order to make proper allowance for his 
chance scores we should subtract the number he marked 
incorrectly (5) from the number he marked right (15) in 
order to get his actual score, 10, because from our hypothe- 
sis we know that the number marked incorrectly would be 
half the number he guessed at. Therefore, his actual score 
would be 10, which, according to our hypothesis, is what 
he actually knew about the test. 

Suppose again a pupU actually knows 16 of the 20 parts 
of a test. How would the scores show this fact ? Since he 
actaally knows 16 of the 20 parts of the test he would put 
the proper check marks before these 16 parts and guess at 
the other 4 parts. He would get 2 of these four parts 
right by the laws of chance. Therefore, his paper would 
have 18 answers marked correctly. Subtracting the num- 
ber he had wrong (2) from the number he answered cor- 
rectly (18), giving him a final score of 16, which is accord- 
ing to hypothesis. 

2. Where there is a choice among more than two answers. 
— In the geography questions cited above, the student has 
a choice of one of three answers. By the laws of chance 
he would, therefore, get 33}^ per cent of them right even 
if he knew absolutely nothing about them. How would 
the papers be scored in a case of this kind? Let us take 
the case cited above and suppose there are 20 parts to 



224 EDUCATIONAL MEASUREMENT 

the test and that the student aetually knows 11 of them. 
He would, therefore, put the proper check mark before 
the 11 that he knew and guess at the other nine, three of 
which he would get right by chance. His paper would, 
therefore, show 14 parts marked correctly. Now since he 
would get but one in three right by chance of the num- 
bers ho guessed at, we know that the number marked in- 
correctly would be twice as large as the number he 
marked correctly. Therefore, we would subtract one-half 
the number he marked incorrectly (3) from the total num- 
ber marked correctly (14), thus leaving his final score 11, 
which is according to hypothesis. 

If the number of answers to choose from was four in- 
stead of three and a student actually knew, let us say, 16 
of the parts, his final score would be determined as follows ; 
He would guess at four of them and get 25 per cent, or 
one, of them, right by chance. Therefore, his paper would 
have 17 parts marked correctly. Since he marked only 
one in four right by chance the number marked incorrectly 
would be three times as large as the number marked right 
by chance. Therefore, one-third of the number marked 
wrong, or one, would be the number to be subtracted from 
his total score in order to get his actual score. 

Some Objections to Teats of This Kind. — It may be 
argued that as much information cannot be obtained by 
this method as the old method because a child told to 
discuss a certain topic would present knowledge that would 
take a dozen or more questions to bring out by the new 
method. There seems to be much truth in this criticiam. 
It may be, however, that enough additional questions may 
be asked by the new method to offset the apparent loss In 
giving up the old system. 

It is also argued that an examination of this kind 
more or less superficial and does not involve, to any 
extent, the higher thought processes such as reasoning. 



nd is J 
great I 
ttiing, J 




SCORING THE TESTS 225 

imagination, etc. It would seem, however, that these proc- 
must go on the same as by the other method. The 
only difference is that the new method does not impose 
the mechanical operation of actually writing out the 
answen. Some claim that the choice of words and the 
drill in sentence formation aids thought and that this is 
all lost when the answers are ready made. They claim that 
abstractions, comparisons, and reasouing do not take place 
to the same degree in the new method as in the old. It 
«eems to the writer that these criticisms are not well 
founded. 

Effect of Incorrect Statements Being Placed before the 
Student. — Another criticism is that because incorrect 
■tatements are placed before the children the psycho- 
logical influence is bad, the false statements being taken 
as truths. In answer to this criticism it may be said that 
the facts do not warrant the statement that this condition 
prevails to an appreciable degree. Moreover the chQd 
would bring into eonscionsness the right and wrong 
answers in forming his judgments by the old method of 
giving examinations. 

It is true that this method does not show where the 
reasoning goes wrong or ceases altogether ; hut it does save 
students the agony and perspiration necessary to perpe- 
trate an answer Uke the following cited by McCaU.^ A 
student was asked the following question in a recent course 
in educational measurements: "Which three of the tests 
described by "Whipple do yoa think would be of most 
■ervice in an elementary school, if your school had a 
psychologist to apply them?" The answer was: 



The tests described by Whipple embrace moat of the diffienltiea 
that would be embraced in problems of classroom instruction, 



J 



226 EDUCATIONAL MEASUREMENT 

think his tests embrace a great variety of methods of approach 
and it seems difficult for me to think of just three to whom the 
presence of a psychologist in a school would give help. I would 
think it would be tests in which knowledge of the workings of 
a child's mind and its growth and development would be most 
apparent since those not particularly trained might focus on 
others not of this kind. I feel it would be unwise to specifically 
mention just three when the number is so great which would 
fulfill all these re<iwirements. Every teacher to be a psychologist 
would help all classroom measurement work of whatever kind 
greatly, I know since we cannot know of t!ie inSuence of a test 
upon which any group except by the luental xeaction produced. 

In further support of the true-false examination it is 
maintained that it will promote a better feeling between 
the teachers and pupils ; pupils wiH no longer strive to 
tell what they do not know; it eliminates the personal equa- 
tion and makes a more pleasant atmosphere in general. 

The Values to Ee Assigned to the Scores. — In the fore- 
going discussion of scoring, the test questions have been 
difficulty questions, the problem being, "How hard a ques- 
tion can the pupil answer?" The answers were either 
right or wrong and each part was given an arbitrary value 
of 1. It is evident that some of these questions are more 
difficult than others. The question, therefore, arises as to 
whether or not the scores should be weighted so that credit 
may be given according to the relative difficulty of the 
various parts of the test. The defense of this method is 
that there is such a high correlation between the scores 
when the questions are weighted and when they are given 
an arbitrary value of 1 that the additional accuracy is not 
worth what it costs to get it. 

Charters found a similar condition in making his lan- 
guage tests and did not weight his scores for that reason. 

Some experimentation was done in the School of Edu- 
cation, University of Oregon, upon the correlation between 
the scores made where each part of the test is weighted 



SCORING THE TESTS 227 

according to its diffienlty and where the parts were given 
an arbitrary value of 1. The data were taken from the 
Douglass Standard Diagnostic Tests for Elementarj' 
Algebra, the Monroe Standardized Silent Beading Tests, 
and the Kansas Silent Reading Tests devised by Dr. F, J. 
Kelly. From random samples tasen from papers in each 
of these tests, the correlation between the weighted scores 
and the scores where each part was given an arbitrary 
value of 1 was, in each case, taken as 0.9 or above- 
General Problem of Weighting Scores.^ — In test making, 
however, if one wishes to weight the scores it may be done 
by determining the relative diffienlty of each part of the 
test. This is usually done on the principle that the larger 
the number of children missing a part, the more difficult 
that part is and the larger score it should have. The 
following illustration from spelling will make the point 
clear. In giving the ordinary spelling examinations or 
tests the usual method of scoring is to mark the papers 
on a basis of 100 per cent. If there were 20 words, each 
word spelled correctly is given an arbitrary value of 5. It 
might so happen that two of the words in the test were 
"home" and "separate." The grading of the papers 
might show that 98 per cent of the fifth grade in a certain 
city system were able to spell the word "home" correctly 
whereas only 40 per cent were able to spell the word 
"separate" ; yet, by the old method of marking the papers, 
a child would be given 5 per cent towards his final grade 
if he spelled the word "home" correctly and the same 
amount if he spelled the word "separate" correctly. It 
is evident that since the spelling difficulty of the latter is 
greater than that of the former, it should receive a higher 
score. If this is done it is called weighting the word on 
a basis of its spelling difficulty, which is probably the most 
scientific method of weighting the various questions or 
parts of tests in the tool subjects. The question 



J 



228 EDUCATIONAL MEASUREMENT 



how much weight to give to a particular part of a 
is generally determined in one of two ways. 

1, By the teacher's judgment. — This is the usual method 
followed by teachers in giving the ordinary examination. 
If the scoring is to be done by the per cent method the 
teacher may arbitrarily say that question number 1 is 
worth 8 per cent and question number 2 is worth 12 per 
cent, and so on. She may weight the questions in this 
way on a basis of difficulty, her estimate being that ques- 
tion 2 is one- and-a -half times as difficult as question 1, or 
she may assign these weights, not because question 2 is 
more difficult than question 1, but because she thinks it 
is of more importance for social or other reasons that the 
children should know question 2. This method is rarely 
followed in test making. 

2. By weighting ike parts according to the distribution 
of abilities as shown by the normal frequeiicy curve. — In 
this case the per cent of pupils missing each part is deter- 
mined and these per cent values are transmuted into muts 
of standard deviation (sigma) or probable error (P.E.). 
There are many methods that may be followed in doing 
this. One of the most common is perhaps to assign an 
arbitrary value of 1 to the easiest word or question in the 
test. This is determined as indicated above by finding the 
number of pupils who are able to answer it as compared 
with the rest of the questions in the test. The procedure 
may be illustrated as follows: Suppose the easiest word 
in a spelling test is spelled by 90 per cent of the pupils 
and the nest word in order of difficulty is spelled by 80 
per cent of the pupils. What should be the weighting 
assigned to these two words? If we desire to measure all 
the words in terms of the easiest word we may assign an 
arbitrary value of 1 to the first word. 

Tables have been prepared so that it is possible to con- 
vert percentile scoris into units of standard deviation and 



1 test I 



I 



SCORING THE TESTS 229 

wraght them according to a normal distribution. For in- 
stance, in the above example a. word missed by 10 per cent 
of the pupils is given a value of 1.73 (approximately), and 
one missed by 20 per cent is given a value of 2.16 (approxi- 
mately). Therefore, the relative weights assigned to the 
two words are to each other as 1.73 is to 2.16. If we give 
the first word an arbitrary value of 1, then the weight or 
value of the second word would be 1.25.* There is no par- 
ticular reason why the easiest word or question should be 
given an arbitrary value of 1 other than the fact that some 
fractions may be avoided by this procedure. 

Accumulatioii Scores ajid Scores of G-reatest Difficulty. 
— There are two general ways of determining the final 
scores of pupils. One method is to give each question or 
part a weighted value and let the sum of the scores of the 
various parts constitute the final score of the pupil. This 
ia the method used in the Kansas Silent Reading Tests 
devised by Dr. Kelly, the Monroe Reading Tests, the 
Douglass Algebra Tests, and many others. The final score 
given a pupil is the accumulated values or weights given 
to each part. This procedure is followed also where the 
parts are not weighted but each part is given an arbitrary 
value of 1 as in the Courtis arithmetic tests. Here each 
problem is given a value of 1 and a pupil's score is the 
sum of the problems solved correctly. 

The other pethod of determining the final score of a 
pupil is determined by the weighted value of the most 
difficult problem a pupil can solve. The "Woody Arithmetic 
Tests illustrate this method of scoring. The final score 
given to a pupil is not determined by finding the sura of 
the weighted values of all the problems, but is simply the 
weighted value of the most difBeult problem solved. 



J 



230 EDUCATIONAL MEASUREMENT 

The same principle is followed in the Thomdike Hand- 
writing Scale. The score of the pupil is the highest quality 
reached and not the aeeumulation of the quality values 
ran^ng from the lowest to the highest quality reached by 
the examinee. 

BiBLIOGSAPHT 

1. BuECESS, Mat Ayres, The MeaauTement of Silent Reading 
(Department of Education, Russell Sage Foundation, 1921). 

2. BuBT, C, ITie Distribution and Relations of Educational 
Abilities (King and Son, London). 

3. Courtis, S. A., The Gary Public Schooh; Measurement of 
Classroom Products (General Education Board, New York, 1919). 

4. Gray, C. T., A Score Card for Measuring Handwriting, 
Bulletin No. 17 (The University of Texas, 1915). 

5. Haggerty, M. E., The Intelligence Examination (World 
Book Co,). 

6. Haqisehty, M, E., "Recent Developments in Measuring 
Human Capacitiea," Journal of Educational Research, Vol. 3, 
AprU, 1921. 

7. Healt, W. 0., The Individual Delinquent (Little, Brown 
& Co., 1915). 

8. HoLLiNGWORTH, Leta S., Vocational Psychology (D. Apple- 
ton & Co., 1916). 

9. "Intelligence and ita Measurements; A Symposium." Journal 
of Educational Psychology, Vol. 12, March and April. 1921. 

10. JuDD, Chahles H., Measuring the Work of the Public 
Schools, Cleveland Educational Survey (Russell Sage Founda- 
tion, New York, 1916). 

11. Link, H. C, Employment Psychology (The Macmillan Co., 
1916). 

12. MoCall, C. a., "A New Kind of School Examination," 
Journal of Educational Research, Vol. 1, pp. 33-46. 

13. McCall, C a., "Proposed Uniform Method of Scale Con- 
struction," Teachers College Record, Vol. 20, No. 1, January, 
1921, pp. 31-51, 

14. MtrNSTEBBEBG, HUGO, Psychology and Indvatrial E^deneg 
(Houghton Mifflin Co., 1913). 

15. National Intelligence Tests, prepared by Haggerty, Ter- 
_^sp, Tborndike, Whipple and "YeTVea (Woild Book Co.). 



J 



SCORING THE TESTS 231 

16. National Society for the Study of Education, the various 
Yearbooks (PubKc School Publishing Co., Bloomington, HI.). 

17. Otis Group Intelligence Scale, designed by Dr. Arthur S. 
Otis (World Book Co.). 

18. PiNTNER, Rudolf, and Anderson, Margaret, M., The Pic- 
ture Completion Test. 

19. PiNTNER, Rudolf, and Patterson, Donald, A Scale of 
Performance Tests (D. Appleton & Co., 1917). 

20. RossOLiMO, 'Cental Profiles; A Quantitative Method of 
Expressing Psychological Processes in Normal and Pathological 
Cases," Journal of Experimental Pedagogy, Vol. 1. 

21. RuGG, Harold O., "Scientific Method in the Reconstruction 
of Ninth-Grade Mathematics," Supplementary Educational Mono- 
graphs, Vol. II, No. 1, 1918. 

22. Rusk, Robert R., Experimental Education (Longmans, 
Green & Co., 1919). 

23. Terman Group Tests of Mental Ability, designed by Lewis 
M. Terman (World Book Co.). 

24. Terman, Lewis M., The Measurement of Intelligence 
(Houghton Mifflin Co., 1916). 

25. Yerkes, Bridges, and Hardwick, A Point Scale for Meas- 
uring Mental Ability (Warwick and York, 1915), 




THE MEASDBEMENTS OF EDUCATIONAL PROCESSES AND 
PKODUCTS IN FIVE FIELDS OF SCHOOL WORK 



In the latter part of Chapter II the entire field of tests 
and measurements was arbitrarily divided into seven divi- 
Kions. The amount of work done in each division seemed 
to warrant such an arbitrary classification. It should not 
be inferred that the divisions made are the only divisions 
into which the field of measurements could be divided. It 
does seem, however, that such a classification is quite ex- 
clusive and there is comparatively little overlapping in 
these fields. Enough work has been done in each of tbem 
to give a great mass of data which are beginning to throw 
a great deal of light on the processes and products of 
education. In the last five chapters we have discussed 
and criticized the measurements of intelligence and the 
measurements of school achievements. In this chapter we 
shall discuss, very briefly, the other five fields, not with 
an idea of treating any one of them exhaustively but simply 
to call attention to the work that is being done along the 
lines mentioned in Chapter II. The fields are: (1) the 
measurements of the materials of instruction ; (2) the meas- 
urements of the physical growth of school children; (3) 
the measurements of the money cost of education; (4) the 
measurements of school buildings; and (5) the measure- 
ments of retardation, acceleration, and elimination. 

With the possible exception of the fourth category, suffi- 
cient facts and data are extant to form the basis of a 
large volume, and, in some cases, many volumes, on each of 



■ won 

B IB t 

L 



MEASUREMENTS IN OTHER FIELDS 233 

these fields of measurement. It was also pointed out in 
Chapter II that each of these fields may be subdivided into 
smaller divisions and that some are combined to form new 
fields. Tests in vocational guidance, for instance, may in- 
volve measures of intelligence, measurea of school achieve- 
ments, and physical measurements. Other measures are 
similarly combined to form new and definite fields. With- 
out further discussion of the broad general iield of meas- 
urements we shall deal more specifically with the various 
divisions as outlined. 

I. Measdeements op the Materials op Instruction 

It is only within the last decade that any considerable 
work has been done in measuring the materials of instruc- 
tion. To some it has seemed like educational pedantry 
to count the words in a spelling book, or to score a text- 
book of any kind in order to get an analytic conception 
of its contents. On the other hand, the foUy of presenting 
material year after year with little knowledge of its con- 
tent and with no quantitative conception as to the relative 
amounts of the various elements that compose it has led 
many students to an intensive study of the materials of 
instruction. 

Determination of a Spelling Vocabiilary. — It was by 
studies of thia kind that a spelling vocabulary of school 
children and also of adults was determined. Each indi- 
vidual of school age, or, at least, after he has passed the 
first two or three years of his school life, has four vocabu- 
laries: (1) reading, (2) speaking, (3) hearing, and (4) 
spelling, or writing. Spelling being a tool, there would 
be no need for learning to spell words unless an individual 
would use these words when he writes. The problem then 
iB to discover what words an individual uses when he 
writes. This can be done only by taking the written com- 



234 EDUCATIONAL MEASUREMENT 



positions of thaae who write and analyzing them in order 
to determine the words used and their fi-eqaeaey. These 

compositions may include business letters, friendship let- 
ters, compositions written in school, the composition of a 
daily newspaper, and many other types of material 

Since it is not possible to teach all the words that one 
may use when he writes, the best that can be done is to 
determine the words that occur in greatest frequency and 
teach them as a minimum word list. Many perplexing 
problems present themselves in detennining the proper 
word lists, such as: How may one find out what words 
should be taught S Ayres ' made up his list of 1 ,000 words 
by combining the results of four studies in spelling. One 
study was made by Kev, J. ICnowles of London, England, 
and was pnblisbed in pamphlet form under the title, "The 
London Point System of Reading for the Blind" (1904). 
In making this study the author took passages from the 
Bible and other literature, containing in all 100,000 words, 
and from this list took the 353 words of the greatest fre- 
quency. 

The second study was made by R. C. Eldridge of Niagara 
Falls, Eldridge made an analysis of 250 different articles 
taken from four issues of four Sunday newspapers pub- 
lished in the city of Buffalo. He found that they con- 
tained a total vocabulary of 6,002 different words and 43,- 
989 riinning words. 

The third study was made by Ayres and published by 
the Russell Sage Foundation in a monograph entitled. The 
Spelling Vocabularies of Personal and Business Letters, 
This study consisted of a tabulation of 23,629 words from 
2,000 short letters written by 2,000 people. The total 
vocabulary used was found to consist of 2,001 different 

itliiig (Division of Education, 



} 



MEASUREMENTS IN OTHER FIELDS 235 



r words. The aivaber of appeai-ances of each was reported 
in the monograph. 
The fonrth study was made by Cook and O'Shea and 
the results presented in a book entitled The Child and His 
Spelling, published in 1914. This study consists of a tabu- 
lation of approximately 200,000 words taken from the 
family correspondence of 13 adults. The total vocabulary 
was found to be 5,200 different words. 
The hst of 1,000 commonest words in the Ayres Scale 
was finally selected from these fonr studies by finding 
the frequency with which each word appeared in the four 
studies, weighting that frequency according to the size 
of the base of which it was a part, adding the four fre- 
_ queneies thus obtained, and finding fheir average. 

Anderson ' attempted to determine the spelling vocabu- 
lary by analyzing the words found in 5,000 letters gathered 
by school children from the various sections of the state 
of Iowa and incorporating the words of greatest frequency 
into a minimum spelling list. His list contains approxi- 
mately 5,000 words. 

Another problem that must be solved is the question as 
to how great a frequency a word must have for each 
100,000 running words before it is incorporated in the 
mmimum spelling list. This must be decided arbitrarily. 
For instance, suppose a word occurs but once in 100,000 
running words. Is that frequency sufficient to justify its 
being tausrht in the elementary school? Or, should the 
writer be referred to the dictionary to find how to spell 
a word whose frequency is but 1 in 100,000 running words? 
It is probably safe to say that a word with a frequency of 
less than 3 in 100,000 running words should not occur in 
a spoiler for the elementary school. 



' The DelermiruUiim of a Spdling VocabtJary Band upoxv W-riJiKii. 
Correspondence (Ph.D. thesis. Univeraity ol Icwia,, \Wri . 



236 EDUCATIONAL MEASUREMENT 

Having determined the list of words that should "be 
taught in the elementary scliool, the next problem is to 
determine t]ie order in which they should be taught. This 
again involves measurements in order to properly grade 
the words. Several factors enter into this problem- Words 
might be graded oa a basis of their spelling difBculty, the 
easiest words being assigned to the lower grades, and the 
more difficult ones to the upper grades. Or again they 
might be graded on the basis of use. Children in the 
lower grades do not use quite the same vocabulary as 
children in the upper grades. Jones * found that when 
children, from grades 2 to 8 inclusive, were asked to write 
compositions, the average vocabularies from grade to grade 
were as follows : 

Grade Number of Words 



4 1,235 

5 1,489 

6 1,710 

7 1,926 

8 2,135 

If words were selected exclusively ou a basis of use, the 
words used by children in the second grade should con- 
stitute the spelling vocabulary for that grade and addi- 
tional words used by children in the third grade, together 
with those left over from the second grade, would con- 
stitute the words for the third gi'ade, and so on. Use, 
frequency, and difficulty are the chief factors which must 
determine the grading of the words in the subject of spell- 
ing. Extensive measurements have been made in all three 
factors and the grade in which a word is now presented 



MEASUREMENTS IN OTHER FIELDS 237 

is no longer a mere matter of opinion but is the result of 
scientific investigation. 

A Study of the Reading; and Spelling Vocabularies of 
Books Used in the First Three Grades.— Perhaps the most 
thorough and complete study of the words that children 
of the first three grades are called upon to read and spell 
has been made under the direction of the author by Mias 
Ruth Chase. 

The problem originated in the following way : The Text- 
boolr Commisaion of the state of Oregon met and adopted 
a list of books to be used by children in the first three 
grades of the elementary school for a period of six years. 
The books, with the number of pages in each, are given 
below: 

1. Beacon PTitner, 124 pages 

2. Naturai Primer^ 122 pages 

3. Beacon First Reader, 130 pages 

4. Natural First Header, 136 pages 

5. Natural Second Reader, 256 pages 

6. Natural Third Header, 304 pages 

7. Hamilton's Essentials of Arithmetic, 124 pages 

8. New World Speller, Reading Vocabulary 

9. New World Speller, Spelling Vocabulary, 124 pages* 

The Problem. — The study was made to determine: (1) 
the number of different words a child would be called 
upon to read, spell, and understand, to meet the minimum 
requirements of the state course of study for the state 
of Oregon; (2) how rapidly a child's reading and spell- 
ing vocabularies grow, assuming that the work is taken 
up in the order indicated by the state course of study; (3) 
the frequency of the words used ; (4) the correlations be- 
tween the hooks used, that is, the number of words common 

* The New World Speller was divided into two parts for purposes ot 
aeorin^ . * The eighth book mentioned above deals with the reading 



J 



238 EDUCATIONAL MEASUREMENT 

to aU the books; (5) the nnmber of running words in the 
series. 

Terms Used in (fee Study. — Different words means the 
number of separate words used jn each book. If the proper 
name "Mary Jones" were used, it is listed as two words. 
The plural and singular of a word were counted as two 
words. 

There were 503 different words used in the Natural 
Primer, although 265 of these were not counted as new 
words because they had been used in the Beacon Primer, 

New words. A word used for the first time is called 
a new word. In the Beacon Primer each word was counted 
as a new word because this book was the first one used 
in the series. The sum of the new words used in each 
book is the total number of different words used in all the 
books. 

Used words are words which have occurred in an earlier 
book of the series. The Natural Second Reader contains 
1,770 different words, of which 861 are new and 909 are 
■used words. 

Running words means the number of times all the words 
occur. For example, if 12 different words are found and 
the sum of their frequencies is 49, the number of running 
words is 49. 

The Results. — Table III shows the distribution of words 
in each book. The numbers at the beads of the columns 
refer to the readers named above: 



Table IH- 


-DwTRiBnnoN 


AND Size o 


VoCABtTLABISB 






1 


2 


3 


4 


5 


6 




s 


9 


Nmnlier ol runDing 


7,007 
74: 

743 


so; 

S81 


8,*55 
1.376 


8.87S 
84t 


1.77( 
2,502 


{■3 

4.390 


12,fi74 

'75; 
fi29 

4,919 


2,70fi 
4.977 




Number ot dianvnt 




NunibEr of used words 
Number of DEW words. 

''[™m%ookl™bo''ok7 


■a 

B.ISD 



'- 



MEASUREMENTS IN OTHER FIELDS 239 



The study reveals the following facts: 

(a) 4,977 different words constitute the child's miniinuTn read- 
ing vocabulary, to which are added 213 words he must learn to 
spell which are not found in his reading vocabulary, making a 
total of 5,190 different words. 

(6) 106,121 is the number of running words. 

(c) 289 words, or 5.8 per cent of the child's reading vocabulary, 
are used 75,591 times and constitute 71.2 per cent of the running 
words in the reading vocabulary. 

(d) 13 words occur 27,458 times. 

(e) 10 words occur 24,520 times. 

(/) 1,470 words, or 29.9 per cent of the words, occur but once. 

(g) 1,453 words were listed in the speller 6,767 times, or an 
average of 4.6 times. 

(h) 3,218 words occur 29,060 times, an average of 9 times. 

(t) 289 words occur 75,591 times, an average of 261 times each. 

(j) 213 words were found in the speller which were not found 
in the readers. 

Table IV. — ^Thibtben Words op Greatest Frequency Found in 
TE[E Readers Compared wrm the Ayres ' List 



Rank in 


Frequency in 


Rank in 


Readers 


Readers 


Ayres' List 


1. the 


7,927 


1. the 


2. and 


3,142 


2. and 


3. to 


2,441 


3. of 


4. a 


2,336 


4. to 


5. he 


1,790 


6. I 


6. of 


1,714 


6. a 


7. I 


1,629 


7. in 


8. in 


1,473 


8. that 


9. you 


1,385 


9. you 
10. for 


10. was 


1,343 


11. said 


1,069 


11. it 


12. it 


942 


12. was 


13. is 


927 


13. is 




27,458 





> A Measuring Scale for AbUUy in Spelling (Division of Education, 
Russell Sage Foundation^ 1915). 



340 EDUCATIONAL MEASUREMENT 

It is interesting to note that the 13 most freqaent words 
found in this stady are identical with those of the Ayre* 
list ext-ept one. This list contains the word "said," which 
is not found in the Ayrea list; and the word "that" is 
found in the Ayrea list but not in the thirteen most fre- 
quent words in this list. 

The Contents of Three American Histories. — Another 
htudy made under the direction of the author that illus- 
trates measurements of the materials of instruction is the 
scoring of three textbooks in American history. In Jnne, 
1919, the Textbook Conmaission for the state of Oregon 
readopted Mace's School History of the United Stales for 
a period of six years. The book had been in use for a 
number of years in Oregon and was generally conceded 
to be a satisfactory text in United States history. From the 
fact that it was to be the official text in American history 
for the next six years it was thought that its usefolnesa 
might be increased if its contents were scored in reference 
to some of the most salient featares that are being dis- 
cussed in the reorganization of history courses in the 
grades. "With this thought in mind, a class of advanced 
and graduate students taking a course in the elementary 
curriculum with the author at the University of Oregon 
undertook as a special problem to score the book in ref- 
erence to four salient features. In order to provide some 
standards for evaluation and comparison, two of its com- 
petitors were scored with it. These books were History 
of the American People by Beard and Bagley, and History 
of the United Slates by Gordy. The scoring was done in 
reference to the following points: 
1, Names of places — scored und3r the following headings: 

(a) Names of continents and countries 

(b) Names of states and territories 

(c) Names of rivers 

(d) Names of cities and towns 

(e) Names used in connection with military evenfa 



'.ory 
i in 

J 



MEASUREMENTS IN OTHER FIELDS 241 

(/) A miscellaneous list which did not fit under any of the 
above headings 

2. Names of men — scored under the following headings : 

(a) Explorers and discoverers 

(6) Rulers^ presidents, and governors 

(c) Names mentioned in connection with industry and in- 
vention 

(d) Names of statesmen 

(e) Names mentioned in reference to military matters 

3. Dates 

4. Amount of space devoted to political, social and economic, 

and military matters; also the number of pages devoted to 
pictures, maps, and illustrations. 

Table V. — Names of the Ten Counteibs op Greatest Frequenct 



Mace 


Beard and Bagley 


Gordy 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


England 

Umted States. 

Prance 

Spain 

Cfanada 

Mexico 

India 

Great Britain . 

Holland 

Alaska 


250 
129 
91 
77 
44 
42 
16 
16 
14 
12 


United States. 

EIngland 

Fraiice 

Great Britain. 

Spain 

Gfennany 

Mexico 

Russia 

China 

Cuba 


333 
122 
93 
82 
68 
60 
35 
24 
21 
19 


England 

United States. 

France 

Spain 

Mexico 

Canada 

Cuba 

China 

Japan 

East India. . . 


220 
158 
62 
58 
27 
18 
15 
11 
11 
10 



Table VI. — Summary op Countries and Continents Mentioned 



/ 


Mace 


Beard and 
Bagley 


Gordy 


Total number of countries mentioned 

Number mentioned but once 


42 
14 

785 
6 

286 


55 

15 

1,069 

6 

224 


48 
16 


Total number of mentions in each text 

Number of continents mentioned 


687 
5 


Frequency of mention of continents 


161 



EDUCATIONAL MEASCREMENT 



Table Vll. — NAVEa op the Tkk States a>t> Tebritokies Of 
Greatest Fbequenct 



Uace 


Beard and Bagley 


Oirdy 


Name 


quency 


Name 


quency 


Nante 


Fre- 
quency 


Virginia 

New York..,. 

Pomwylvania . 
South CfLrolina 

Tennessee 

Kentucky 

New JeTBey... 


138 
75 
72 
58 
44 
42 
36 
36 
36 
33 


Virginia 

Massachusetts 
Pennsylvania . 
New York.... 
South Carolina 


86 

64 
43 
39 

38 
38 
34 
33 


Virginia 

MaasachuBetta 
New York. - . . 
South Carolina 

Geoi^ 

Loui^ana 

Connecticut... 
North Carohna 

Florida 

Pennsylvania . 


76 
57 
51 
42 
25 


rcMB 

Cali/omia,-.- 

Kentucky 

IllinoiH 


24 
21 
19 
19 



Mace mentions 51 states and territories with a total 
frequency of 1,066; 49 of them he mentions two or more 
times. Beard and Bagley mention 51 states and territories 
with a totaJ frequency of 1,085, all of which are mentioned 
two or more times. Qordy mentions 48 states and terri- 
tories with a frequency of 572, eight of which are men* 
tioned but once. 

Table VIII records the number of rivers mentioned in 
the three texts. Maee mentions 56 rivers with a total fre- 
quency of 256, of which 32 appear but onee. Beard and 
Bagley mention 35 with a total frequency of 142, of which 
19 appear but once. Gordy mentions 43 with a total fre- 
quency of 219, of which 22 appear but once. 

Table IX records the number of cities appearing in the 
three texts with their frequencies. Mace mentions 153 
cities with a frequency of 720, 77 of which are mentioned 
but onee, and 26 are mentioned twice. Beard and Bagley 
mention 186 cities with a total frequency of 6S2, of which 



MEASUREMENTS IN OTHER FIELDS 243 



Table VIII. — Names of the Ten Rivebs of Greatest Fbeqitenct 



Mace 


Beard and Bagley 


Gordy 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


Mississippi 

Hudson 

Ohio 


60 

25 

24 

24 

13 

11 

8 

8 

6 

5 


Mississippi. . . 

Ohio.. 

Hudson 

Delaware .... 

Potomac 

Columbia 

Missouri 

Rio Grande . . 
Arkansas. . . . 
St. Lawrence. 


47 
21 
9 
7 
7 
5 
4 
4 
3 
3 


Mississippi. . . 

Hudson 

Ohio 

Shenandoah. . 
St. Tiawrence. 

Mohawk 

Potomac 

Connecticut.. 

Tennessee 

Delaware 


60 
34 
28 


Potomac 

Delaware 

St. Tiawrence. . 
Connecticut. . . 
Rio Grande. . . 

James 

Niagara 


10 
8 
8 
5 
5 
4 
4 



105 appear but once, and 30 are mentioned twice. Gordy 
mentions 140 with a total frequency of 529, of which 72 
appear but once, and 16 appear twice. 



Table IX. — Names op the Ten Cities op Greatest Frequency 



Mace 


Beard and Bagley 


Gordy 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


New York. . . . 
PhUadelphia . . 

Boston 

Washington.. . 
Charleston.. . . 

Chicago 

London 

New Orleans. . 

Albany 

St. Louis 


65 
60 
59 
45 
31 
25 
22 
21 
17 
17 


New York . . . 
Philadelphia.. 

Boston 

New Orleans . 

Chicago 

Washington. . 
Charleston.. . 

St. Louis 

Pittsburgh . . . 
Buffalo 


68 
48 
42 
26 
25 
22 
18 
17 
16 
12 


New York . . . 

Boston 

Philadelphia. . 
Washinffbon. . 
New Orleans . 

Richmond 

Charleston. . . 

Hartford 

Baltimore 

Albany 


41 
34 
32 
31 
19 
14 
14 
12 
10 
10 



244 



EDUCATIONAL MR\StTtEMENT 



nee to ^ 
a fre- I 
34 are 1 



Table X records the places mentioned in reference 
military events. Mane mentions 172 places with 
quency of 433, 81 of which occar but once, and 34 
mentioned twice. Beard and Bagley mention 127 places 
with a freqneney of 260, of which 70 occnr bat once, and 
34 twice. Gordy mentions 184 places with a frequency of 
560, of which 85 occur but once, and 31 are mentioned 
twice. 

Table X. — Namxs of thk Tkk Places of Ghkatbot FRwmNcr 
ApFBARma nr ComvEcnoN with Miljtabt Affaibs 



Mac€ 




Beard and Baglev 


Gordy 




Name 


Yie- 
quency 


Name 


Fre- 
quency 


Name 


Pre- 
quaicy 


Vicksburg 

Gettysburg. . . . 

Yorktown 

Concord 

Ft.Surater 

Quebec 

Trenton 

Bunker Hai.... 


18 
14 
9 
9 
9 
8 
8 
8 
8 
7 


Concord 

Richjnond. . , , 

Boston 

New York.... 

D.C ...'... 

Virginia 

Lexington 

Gettysburg 

Santiago 

Philadelphia... 


9 
9 

8 
8 

6 
6 
6 
5 
5 
5 


Richmond 

River 

New York. . . . 
Hudson River . 

Virginia 

WafihingtoD, 

D.C 

Philadelphia... 

Boston 

South Carolina 


17 

16 
IS 
16 
16 

15 
14 
12 
11 



Table XI gives a miscellaneous list of places which did 
not fit into any of the above elasaifications. Mace men- 
tions 140 places with a total frequency of 484, of which 
88 places are mentioned but once, 17 twice, and 5 three 
times. Beard and Bagley mention 184 places with a total 
frequency of 696, 112 of which are mentioned once, 25 
twice, and 18 three times. Gordy mentions 136 places with 
a total frequency of 569, 72 of which are mentioned once, 
23 twice, and 18 three times. 



MEASUREMENTS IN OTHER FIELDS 245 
Table XI. — Miscellanxottb List of Nahbs of Placss Mbntionbd 



Mace 


Beard and Bagley 


Gardy 


Name 


Fre- 
quency 


Name 


Fre- 
quency 


Name 


Fre- 
quoicy 


New England . . 
Cuba 


85 
33 
20 
19 
16 
15 
14 

13 
9 

8 


South 

North 

New England. . 
West 


94 
71 
49 
41 
27 
24 
16 
12 

10 
9 


South 

North 

New England. . 
Allegheny Mts. 
West 


114 
85 


Atlantic Ocean. 
West Indies. . . . 


67 
18 


Confederacy . . . 
N. Netherlands. 


Pacific Ocean.. 
East 


16 


Atlantic Ocean. 
LakeChamplain 
Pacific Ocean.. 

Lake Erie 

New World... 


11 


Philippines .... 
District of Co- 
lumbia 

New Amsterdam 
East Indies 


Confederacy.. . 
Atlantic Ocean. 
Mississippi 

VaUey 

Great Lakes... 


9 

8 
8 
7 



Grand Total op PiiACEg Mentioned 

Mace 480 

Beard and Bagley 507 

Gordy .284 



To this total should be added a number of places con- 
nected with military events, which would bring the grand 
total up somewhat higher. 

From the foregoing tables one is impressed with the 
vast number of geographical facts a child in the seventh 
and eighth grades must know to read these histories in- 
telligently. 

If mere frequency of mention is in any way indicative 
of the importance of a place, the rivers, cities, places con- 
nected with military events, countries, etc., may thus be 
evaluated. 

In comparing the frequencies of mentions of the cities, 
rivers, countries, states, places mentioned in reference to 



246 



EDUCATIONAL MEASUREMENT 



military events, and even those ia the miseellaneotia lists, 
one is struck by the great similarity throughout the three 
texts. The first ten places receiving the greatest number 
of mentions in one text are practical^ the same as those 
in the others. 

If this is a fajp evalnation, the teacher may he justified 
in taking, say, the ten places of highest frequency from 
each of the six tables, making sixty places in all, for more 
or less intensive geographical study. 

Table XII is a summary of five tables of men mentioned 
in the three textbooks. 

Space would not permit more than a summary of each 
of these tables. 

Table XII. — Names or Men Mentioned 





M<Ke 


Beard ond 
BagUy 


Gordy 




47 
87 

11 
29 

167 


34 

69 

40 

76 




Rulera, preaidents and goveraors 

Names mentioned in connection with indus- 


66 






Namea mentioned in reference to military 









• Data not extant. 



Table XIII gives the dates that had a frequency of five 
or more in each of the three texts, the distribution of 
dates in the study, "Possible Defects in the Present Con- 
tent of American History as Taught in the Schools," re- 
ported by Horn in the Sixteenth Yearbook of the National 
Society for the Study of Education, Part I, and also the 
study made by Bagley in the Fourteenth Tearbook in the 



MEASUREMENTS IN OTHER FIELDS 247 
Tabi^ Xni. — DATsa AppsABma Fivx or Mobb Tiueb 



tfso, 




AwdandlM'V 


C^ 




srsii 


DatuIUokid 
Accordingto 










byF 




^>cy-.Bt..^ 


D.. 


nJX 


Date 


f™- 


Dita 


,^y 


D>M 


^ 


D>U 


Ku* 


1SS2 


,g 


ISM 


jg 


1862 


J 


1900 


1„ 


~vm 


1~ 


1812 


1[1 




25 


iwo 


J 




84 


im 


2 










iS3T 




1850 


66 


IM17 




1360-63 
I8BS 




I^Ji 


17 




I 


1870 
1883 


53 


1620 


\ 






im 


16 


!76J 


10 


1902 




19*3 


■ 


13U 




1S65 


IS 








¥ 


1861 


7 
















i 


















1910 


h 


1787 




18H-73 




ltll2 


'* 














JS3a-39-«-Si 




laiB 


13 


178! 


5 


ISSO 


33 


1863 


• 


7M-78-M 








































eai-3a-a-T. 




















JBT6-ST 




























lUO 




1860 






10 










17S4-7J-87-89 
















fllO 


13 


















iwo 




1808-89 
lgl4-IS-30 

is3a-4e~H 

186 -7S-77 
1895-90 




1B13 


27 










HS-E3 








I84S-82 




1765 








TB3 








1871 


32 


1783 








s»-as-e7 








































913 




























ISM 












^ 










M 


1857 


15 






300-10-11 




























1816 


12 


1854 


16 






837^0-47 




















sss-89-re 




























































B09-1I-U 




















769-K 












177S 








SOB-Oi-M 




















8I9-2WI 




















8H-Be-a8 




















ffiS"" 




















«l-01-05-(17 








LSOO-03 
1794 


3 


1781 


18 
19 



248 EDUCATIONAL MEASUREMENT 

Table XIII is read as follows : The date 1862 occorred 
16 times in Mace's History; the date 1860 occurred 39 
times in Beard and Bagley; the date 1862 occurred 17 
times in Gordy; the dates 1844 and 1873 occurred six 
times in Mace ; the dates 1848 and 1863 each occurred five 
times in Beard and Bagley, and so on. 

The following facts in reference to the dates in the three 
history texts are significant. Mace's History contains 326 
different dates; Beard and Bagley, 235; Gordy, 206. It 
is interesting to note that the ranks of the history dates 
found in these three history texts do not agree with those 
found by Dr. Horn in the study of "Thirty-Eight Modem 
Crucial Problems" in the above-mentioned study, or with 
the dates selected by specialists in American history and 
reported by Bagley in the Fourteenth Yearbook. The most 
important date found in Mace's History, as far as fre- 
quency is concerned, is 1862; in Beard and Bagley, 1860; 
in Gordy, 1862; in Horn's study, it is 1900; and in 
Bagley's study, 1776. 





Ma>^ 


Beard and 
Bagley 


Gordy 


Type of Material 


Pages 


Per 

Cent 

of 
Book 


Pages 


Per 

Cent 

of 
Book 


Pages 


Per 
Cent 

of 
Book 


Political movementa 

Social and economic move- 
menta 

Maps and iljugtrationa 


144.66 

150.00 
98,40 
83.75 


30.13 

31.28 

20.50 
17.40 


240,50 

155,37 
39.50 
97.95 


37.81 

24.43 

6.21 
15.40 


148-44 

63.22 
66.22 
80-33 


33.96 

14.46 
16.50 
18.88 



The above percentages do not include the bibliographies, 



^^ 



MEASUREMENTS IN OTHER FIELDS 249 



r review questions, and other repeated materials in the texts. 
One significant thing about these texts is the amount of 
space devoted to pictures and maps. Some of the books 
have approximately 15 per cent of their apace devoted to 
pictures. Since the picture cost is more than the cost of 
a regular printed page it brings the approximate cost of 
the pictures up to 20 per cent of the entire cost of the 
book. Wlien one-fifth of the manufacturing cost of a book 
is the coat of reproducing pictures, one should be pretty 
sure that the pictures are valuable and will be generally- 
used. 

If the eontenta of these books are compared with the 
type of material discussed by Horn, cited above, it will be 
seen that in the former relatively much less space is de- 
voted to social and economic movements. However, it was 
not the purpose of the studies to determine the relative 
amount of space that ought to be given to these various 
phases of American history hut rather to determine the 
amount of space that was actually being given in these 
three texts. The same thing may be said in reference to 
the other phases scored. 



I 



II. The Measurements op the Physical Growth op 
School Children 

In the last analysis, education may be reduced to the 
proeeaa of producing, directing, and preventing changes 
in human beings. This process has to do with the physical 
changes as well as the mental. It is a platitude to say that 



K the physical and physiological traits of the individual I 

H primarily condition his total nature ; yet the exact relation I 

H that exists between the physical and mental development I 

H of an individual is not Iniown. I 

I Much work has been done on the measurement of the I 

^1 physical growth of children, largely because some scientists J 



250 EDUCATIONAL MEASUREMENT 

have believed that measnrementa of this kind would in 

some way furnish a key to mental development. Hun- 
dreds of thousands of school children have been measured 
to determine their height, weight, vital capacity, reaction 
time, and other physical traits. Norms for these traits 
for each year of a child's life, until he gets far into the 
'teens, have been established. It would seem that there 
should be a somewhat definite relation existing between 
the development of the nervous sj'stem and the mental 
maturity of an individual; that frail undersized children 
should be taught somewhat differently from the strong and 
robust; and that courses of study should be reorganized 
to take cognizance of the development of physical traits 
of school children. Yet in spite of the hundreds of 
thousands of measurements that have been made on these 
physical traits, little change that takes cognizance of the 
growth and degree of maturity of these physical traits 
has been made in the organization of school programs. 
Some time in the near future the facts that have been 
obtained by the physical measurements of school children 
may be extremely valuable in the teaching process. To 
date, however, their pedagogical value has been small. To 
state the problem concretely, how should the method of 
the recitation or the type of subject-matter presented to 
children who are exceptionally well developed physically 
differ from that presented to those children who are 
scarcely normal as far as physical development is con- 
cerned? It verj' frequently happens that those who are 
physically inferior exceed those whose physical develop- 
ment is far above the normal, and the work done by those 
physically inferior apparently does not injure them. Of 
course, many valuable facts relative to growth have been 
discovered that are beginning to be utilized in health work 
among children. 



J 



MEASUREMENTS IN OTHER FIELDS 251 

III. The Measurements op the Money Cost op Education 

So many measurements have been made in this field, 
and the literature is so familiar, that a hare mention of 
it is sufficient here. We have made careful records of the 
coat of school huildings, school apparatus, teachers' 
salaries, and other expenses incident to teaching. 
Recently we have worked out norms for supervision, teach- 
ing, janitor service, and the cost per clock-hour to teach 
the various subjects. A school official may now compare 
the distribution of school funds in his city with that in 
other cities and determine whether he is paying too much 
or too little for supervision, janitor service, or any other 
phase of the school work. 

Intensive studies have been made in school costs and 
accounting of school work. This phase of measurements 
ia of interest primarily to the department of school super- 
viaion rather than to the average room teacher. 

rv. The Measurements op School Bitildings 

The score card for the measurement and standardization 
of school buildings came as a result of the movement for 
greater efficiency in education. The first one made its 
appearance in 1916 and was made under the direction of 
Dr. George D. Strayer of Columbia University. A new 
Score Card for City School Buildings was made by Strayer 
and EngeUiardt in 1920. This card has two special pur- 
poses: (1) The scoring of school huildings in ihe light 
tf a school building program, to be developed by a city. 
(2) The checking of plans for new school buildings."^ This 
score card has grown out of the experience in evaluating 
more than 1,000 school buildings and the study of school 



J 



252 EDUCATIONAL MEASUREMENT 

building standards. Like many of the scales used in the 
measurements of school achievements, it is a score card 
based on the combined or median judgments of experta, 
A hall one foot too narrow is not wrong in the same sense 
that an addition problem is wrong; but it is not the most 
efScient precisely because it is the combined opinion of 
competent judges that it should be one foot wider. 

When one considers the amount of money that is spent 
in the erection of school buildings, he wonders why an 
attempt to standardize school buildings was not made 
sooner. Individuals judging buildings not infrequently 
think mainly in terms of two or three, or possibly a half 
dozen, elements that seem to them of primary importance 
and often neglect other parts of the building that are of 
equal importance. The score card is designed to include 
as nearly as possible all these details that go to make up a 
perfect school building. In assigning weights to the 
various elements the judgment method is resorted to. The 
scoring is done on a basis of 1,000 points; that is, a perfect 
building scores 1,000. Tiie principal divisions, or headings, 
for scoring city school buildings with the weights assigned 
to each heading are given below : 

1. Site 126 

(o) Location 55 

(6) Drainage 30 

(c) Size and form 40 

2. Building 165 

(o) Placement 25 

(6) Gross structure 60 

(c) Internal structure 80 

3. Service systems 

(a) Heating and ventilating 70 

(b) Fire protection system 65 

(c) Cleaning system 20 

(d) Artificial lighting system 30 

(e) Electric service system 15 

' (f) Water supply system ■ 30 






MEASUREMENTS IN OTHER FIELDS 253 

{g) Toilet system -. 50 

(h) Mechanical Bervice system 10 



(a) Location and eonneetion 35 

(b) Construction and finish 95 

(c) Illumination 85 

{d) Cloakrooms and wardrobes 25 

(e) Equipment 50 

5. Special rooms 140 

(a) Large rooms for general use 65 

(b) Rooms for school officials 35 

(c) Other special aervice rooms 40 

Each of the above headings have a number of subhead- 
ings which go into detail as to the various parts of the 
buildings. 

The score card for the measurement of school bnildings 
is meeting with approval everywhere. Large cities that 
have voted down school bonds for improvement and repairs 
have voted twice the sum called for when the buildings 
have been scored and the exact defects made known to the 
public. 

It seems evident that a score card of this nature will 
eventually become a standard for the erection of most 
school buildings. 

V, The Measurements of Retardation, Acceleration, 
AND Elimination 

Measurements of retardation, acceleration, and elimina- 
tion were among the first to be made in the recent general 
movement in educational measurements for greater school 
effieiency. The movement may be said to have started 
in earnest in 1904 when Dr. William H. Maxwell, city 
superintendent of the schools of New York, showed in hia 
annual report that 39 per cent of the pupils in the ele- 
mentary grades were above the normal age for the grades 
they were in. 



2M EDUCATIONAL MEASUREMENT 

This startling revelation caused school officials to check 
Dp their school systems to see if similar conditions pre- 
vailed elsewhere. From that time to the present those 
in charge of the sehoob in every state in the Union have 
been measuring the amount of retardation in their schools 
and attempting to assign causes for the retardation found. 

It was the importance of this problem with its bearing 
on the question of the adaptation of the school to the needs 
of the child, and the almost complete lack of definite in- 
formation bearing on the question, that impelled the 
Russell Sage Foundation to undertake in 1907 an investi- 
gation of "some phases of the adaptability of the school 
and its grades to children,'" The investigators were in- 
terested, not in the individual subnormal, or in a typical 
child, but rather in that large class, varying with local 
conditions from 5 to 75 per cent of all the children in 
our schools, who were older than they should be for the 
grades they were in. Data gathered for this study seemed 
to warrant the statement that at least 6,000,000, or 33 per 
cent, of the pupils in the public schools were retarded. 

Thirteen per cent more retardation was found among 
boys than among girls. The percentage of girls who com- 
pleted the common school course was 17 per cent greater 
than the percentage of boys. Studies of this kind brought 
out the fact that the schools as then organized were better 
fitted for the needs of girls than they were for the needs 
of boys. 

It was also found that there was a high correlation be- 
tween retardation and elimination. In those schools where 
the pupils were greatly retarded, a large majority did not 
remain to finish the course. 

The report of the Commissioner of Education in 1907 
shows the distribution of children through the grades in 
n Our Schools (Russell Sage Foundo- 



1. m 



MEASUREMENTS IN OTHER FIELDS 255 



386 cities of 8,000 population and above. Prom thia report 
it was shown that for every 1,000 pupils entering the first 
grade, the second grade would have 723, the third grade, 
692, the fourth grade, 640, the fifth grade, 552, the sixth 
grade, 462, the seventh grade, 368, the eighth grade, 263, 
the first year of high school, 189, the second year of high 
school, 123, the third of year high school, 81, and the fourth 
year of high school, 56. 

Of course conditions have improved since 1907 as a result 
of the studies that have been made on retardation and 
elimination. The following study made by Ayres is 
probably typical of the progress that is being made. 

A little more than ten yeara ago the Department of 
Education of the Russell Sage Foundation made a coopera- 
tive study of 200,000 school children in 29 city school 
systems to determine their progress through the grades,' 
In the spring of 1920 the same schools were asked to re- 
peat their earlier study in order that some estimate could 
be made of the progress, if any, attained during the in- 
terim. Fifteen cities repeated the tests using the same 
procedure and the same record blanks that had been used 
in the first test. The 15 cities repeating the test had 83,283 
chUdren in 1911 and 111,680 in 1920. 

I In working out the age-grade tables a pupil who was 
seven years old and in the first grade was considered 
of normal age and one year was added for each advanc- 
ing grade. A pnpil could thus be classified both according 
to his age and his progress. In regard to age, he was either 
younger than normal, normal, or older than normal. With 

I respect to his progress be was either slow, normal, or rapid. 
Since both age and progress were recorded and there were 
three groups in each classification, each child could be 
•: 
3y8i 



'Leonard P. Ayres, "The IncreasinE Efficieocy of Our City School 
" ElemcnUiTy School JounuU, Vol. 21, Feb., 1921, pp. 41&-423. 



256 EDUCATIONAL MEASUREMENT 

assigned to any one of nine different classes. Table XV 
shows the classification for each 100 pupils in 1911. 

Table XV. — School Chojuibn bt Yocnq, Normal, and Old, and 

8T Rapid, Normal, and Slow Gropps, Fiptebn CmBS, 1911 

{After Ajfres) 




Young 


Normal 


Old 


Total 


Rapid 


6 
20 
2 


3 
21 
9 


2 
11 

26 


11 
52 
37 


Slow 

Total 


28 


33 


39 


100 


Table XV is read as follows : Six children were younger 
than nonnal for their grades and had progressed faster 
than nonnal. The 21 who appear in the second column 
were of normal age and made normal progress, and so on. 

Table SVI shows the conditions in 1920, computed on 
a basis of 100 pupils. 

Table XVI. — School Children by Yocnq, Normal, and Old, and 
BT Hapu), Nobual, and Slow Gnocps, Fittben Citiwi, 1920 

(After Ayres) 




Young 


Normal 


Old 


Total 


ill 


10 
28 
2 


2 

9 


1 
7 
18 


13 

58 

29 


40 


34 


26 


100 


1 The data of Table XVI show that conditions in 1920 1 
1 were better than in 1911. The children who were both 1 
1 young and making more than normally rapid progress 1 
1 had increased from six in each 100 to ten. Those in the 1 
1 center of the table, of normal age and making normal J 



■ MEASUREMENTS IN OTHER FIELDS 257 

H progress, had increased from 21 per cent to 23 per cent. 

9 The most important change is that in the figures in the 

lower right-hand comer which shows that the unfortunate 

misfits who were over age and making slow progress had 

diminished from 26 per cent to 18 per cent. 

During the nine years the percentage of over-age chil- 
dren had fallen from 39 to 26 and the proportion of slow 
pupils from 37 in each 100 to 29 in each 100. Tliese 
improvements are large and important. They represent 
educational economy, financial saving, and human conser- 
vation.' 

When the survey movement in the schools started about 
1912, the questions of retardation and elimination received 
a great deal of attention. A great deal of work has been 
done in this field by Ayres, Strayer, Thorndike, and others. 
The field has been rather completely covered in state and 
city surveys and other official reports. 

The question of the bright child, the accelerated child, 
is just now receiving much attention. Educators are be- 
ginning to realize that perhaps the bright child has suffered 
Ifrora our ill-adapted school system rather than the dull 
one. Besides, it is the bright child who will eventually 
become a leader in society, a moulder of public opinion, 
and hence the child who will yield the largest income on 
the investment made. Schools are accordingly being re- 
organized to take special cognizance of the bright child. 
Large sums of money are being donated to educators for 
special work in this field. Large cities are holding sum- 
mer sessions to make it possible for tlie bright child to 
I gain a half grade or even a grade during six or eight weeks 
in the summer. Special classes are being provided for him 
in most of the larger school systems and in many of the 
more progressive smaller ones. 



•IMd., pp. 418-119. 



258 EDUCATIONAL MEASUREMENT 

All of this means meaaurements. The bright children 
are selected by determining their general intelligence and 
school achievements. It is not too much to prophesy, per- 
haps, Ihat the chief reorganization of the school will be 
along lines which will take cognizance of the individual 
differences among children. 

BiBLIOGRAPttT 

1. Aybes, Leonard P., "The Increasing EfBciency of Our City 
School Systems," Elementary School Journal, Vol. 21, Feb., 1921, 
pp. 416-423. 

2. AkdbrSON, The Determination of a Spelling Vocabulary 
Based Upon Written Correspondence (Ph. D. thesis, Universi^ 
of Iowa, 1917). 

3. Baqlet, W. C, "The Determination of Minimum Essentials 
in Elementary Qet^^raphy and History," National Society for the 
Study of Education, Fourteenth Yearbook, Part I, pp. 131-146. 

4. BlTSG£SS, May Atres, The Measurement of Silent Beading 
(Russell Sagre FoundaUon, New York, 1920). 

5. Chapman, J. C, Scientific Measurement of Classroom 
Products (Silver, Burdett & Co., Boston, 1917). 

6. COTJSna, S. A., The Gary Public Schools, Measurement of 
Classroom Products (General Edneation Board, New York, 1919). 

7. Gregory, C. A., "The Iteading Vocabularies of Third-Grade 
Children," Journal of Educational Research, Vol. 5, 1923. 

8. Gregory, C, A., and Spenoeh, Petek L., "A Geography 
Test for the Sixth, Seventh and Eighth Grades," School and 
Society, Vol. 15, 1922. 

9. JUDD, Charles, H., Measuring the Work of the Public 
Schools, Cleveland Educational Survey (Eussell Sage Foundation, 
New York, 1916). 

10. JoNKS, W. Frakklin, Concrete Investigation of the Ma- 
terials of EngUsh Spelling (University of South Dakota). 

11. Monroe, Walter S., Measuring the Results of Teaching 
(Houghton Mifflin Co., 1918). 

12. Monroe, W. S., De Voss, J. C, and Kelly, F. J., Edaca- 
tional Tests and Measurements (Hougiiton Mifflin Co., 1917). 

13. Pintner, RcnoLS', and Patehson, Dokaid, A Scale of 
Performance Tests (D. Appleton & Co., 1917). 



MEASUREMENTS IN OTHER FIELDS 259 

14. Rice, J. M., "The Futility of the SpeUing Grind," Forum, 
Vol. 23, pp. 163-172, 409-419. 

15. Starch, Daniel, EdiAcational Metisurements (The Mae- 
miUan Co., 1917). 

16. Strayeb, George, and Engelhardt, N. L., Score Card for 
City School Buildings (Teachers College, Columbia University, 
1920). 

17. "Standard Tests for the Measurement of the Efficiency 
of Schools and School Systems," National Society for the Study 
of Education, Part I, Fifteenth Yearbook (Public School Pub- 
lishing Co., Bloomington, III, 1916). 

18. Terman, Lewis M., The Measurement of Intelligence 
(Houghton Mifflin Co., 1916). 

19. "The Measurement of Educational Products," National 
Society for the Study of Education, Part II, Seventeenth Year^ 
hook (Public School Publishing Co., 1918). 

20. Wilson, G. M., and Hoke, J. Kremer, How to Measure 
(The Macmillan Co., 1921). 

21. Whitbeck, R. H., "Where Shall We Lay the Emphasis in 
Teaching Geography?" Education, Vol. 31, pp, 108-116. 

22. Woody, Clifford, Measurement of Some Achievements in 
Arithmetic, Teachers College Contributions to Education, No. 80, 
1920. 

23. Various Conferences on Educational Measurements, Indi- 
ana University Bulletins (University of Indiana, Bloomington, 
Ind.). 




EDUCATIONAL STATISTICS, GENERAL STATEMENT 



The mastery of one more set of tools is necessary before 
the educator may consider himself fully equipped to speak 
intelligently about his educational processes and prodnets. 
Twenty-seven million school children now sit at the feet 
of 750,000 teachers to receive instruction. The annual 
coat of operating this great institution is well beyond the 
billion dollar mark. In a great many school systems more 
than 100,000 school children are under the direction of a 
single educator. The school systems have grown so large 
and there are so many individual characteristics and 
variations that affect the various groups that it is impos- 
sible, without the aid of additional tools by which we may 
compare, contrast, and weigh one tendency with another, 
to get an idea of the general movement of the group as a 
whole. 

The human mind is so constituted that it cannot image 
and comprehend a large number of distinct impressions 
at any one time. For example, he would be a man with 
exceptional mnemonic power who could listen to the read- 
ing of two lists of grades of 100 each that were made by 
students taught by two different methods in the subject 
of addition and tell which method was the better as shown 
by the scores made by the pupils in each group. 

The power needed to detect the movements of groups 
as a whole, when the movements of the individuals within 
the group are many and varied, is to be found in the 
elements of statistical methods. 



EDUCATIONAL STATISTICS 261 

Use of StatisticB in Other Fields. — The edacator is not 
a pioneer in this field, but is simply following the lead 
of the biologist, the economist, the sociologist, the natural 
scientist, and others. The importance of statistical 
methods applied to these sciences is rarely recognized. 
The whole doctrine of evolution and heredity rests in 
reality on a statistical basis. It is in this direction that 
the most important new work of a statistical nature is 
being done. Out of the great number of observations, 
such as the measurements of the height of a group of 
men, the type is found, that is, the average about which 
aU the measurements are grouped according to some 
definite law. The problem is then to determine whether 
this type, or the groupings about it, change, and in what 
way. The differences found in successive generations form 
the data on which arguments as to evolution and develop- 
ment are founded. The same methods apply equally to 
fossil remains, to zoological species, and other organic 
forms. If this method were neglected many valuable argu- 
ments would lose their force and theories would be based 
on personal impressions of phenomena instead of on scien- 
tific measurements. 

In certain sciences a higher degree of accuracy ia re- 
quired than is possible with a single measurement. For 
every measurement, however apparently absolute it may 
be, is a relative thing, made in tern^ of something else 
that is also fallible ; that ia to say, that is subject to varia- 
tion. All scientific measurements, in other words, are made 
in terms of units theoretically invariable, but which are 
always practically applied by means of a mass of matter 
used to measure with, which, itself, is of necessity a more 
or less variable quantity; how variable, being again, in 
turn, a matter of determination. Therefore, many measure- 
ments are made of the same magnitude, and the average 
taken as the true amount. The astronomer, for instance. 



262 EDUCATIONAL MEASUREMENT 

nses statistical methods in getting the position of the 
heavenly bodies. The method of least squares ^ was in- 
troduced by him in locating the position of a star because 
he was anxious to choose the best of several slightly dis- 
crepant observations of the position of the star. 

The Question of Error. — In all physical and biological 
observations the usual method is to take several measure- 
ments of the same quantity in order to get the most nearly 
accurate result obtainable as a measure of the thing in 
question. To all such measurements there enters the factor 
of experimental error due to a number of causes such as 
environmental conditions, the apparatus, or the observer 
himself. 

The errors fall in two general elates: They are (1) the 
constant errors, which, in all measures of the same quan- 
tity, made with the same care, and under the same condi- 
tions, have the same magnitude, or, whose presence and 
magnitude are due to some fixed cause; and (2) the so- 
called accidental errors such as those due to fatigue, cold, 
nervousness, poor eyesight, or other temporary disability 
of the experimenter, or to the constitutional bias known 
as the personal equation. Some errors, of course, are sim- 
ple mistakes, as in reading off the wrong figure, mistaldng 
a 3 for a 5, for instance. 

After a full investigation of the constant errors in all 
physical and biological measurements, the problem then 
remains of combining the observations so that the remain- 
ing accidental errors shall have the least probable effect 
upon the results, and it is to bring about this combination 
of observations that we employ the method of least squares. 

Distribution of Measures about a Point of Central Tend- 
ency. — The averages obtained by different scientists 

' Used to locate a position such tliat the sum of the squares of the 
distance from that point ia the minimum. 



r EDUCATIONAL STATISTICS 263 

from the same series of biolo^eal, sociological, and other 
observations are rarely identical. From such a group of 
measurements it is necessary to deduce the most probable 
estimate. It was early discovered that the measurements 
obtained by different scientists measuring the same thing 

I showed a certain definite arrangement in accordance with 
which values at or near the average of all the measures 
were greatest in frequency ; that positive errors were about 
as frequent as negative ones of the same magnitude; and, 
that large errors seldom occur. The center of balance 
about which the errors, or observations fall, is known as 
the arithmetical mean. In reference to the arithmetical 
mean as a measure of the most probable value of a series 
of measurements on the same thing, Merriman says:' 

tThe most probable value of a quantity which is observed 
direetly several timeB with equal <»re, is the arithmetical mean 
of the meaaurementa. The average, or arithmetical mean, has 
always been accepted and used as the best rule for combining 
direct observations of equal precision, upon one and the same 
quantity ... if the measurements be but two in number, the 
arithmetical mean is undoubtedly the most probable value; and, 
for a ^eater number, mankind, from the remotest antiquity, haa 
been accustomed to regard it as such. 

Pop example, out of ten discrepant results, it is impossible 
to ascertain the true value. What is the best representative 
of that value? Experience has shown the arithmetical 
mean to be the beat, that is, the most representative value 
of a series of observations made under the same conditions, 
all being equally reliable. The arithmetical mean is so 
regarded, because it is a value, the deviations from which 
in the plus and minus directions, being equally probable, 
will cancel one another. Or again, if one htindred judges 

*A TaibookimlheMelhodof Least Squares (,'New York, 1913), p. 22. 

J 



EDUCATIONAL MEASUREMENT 



room I 

upon 1 

e rela- " 



were called upon to judge the length of a certain 

which was actually 80 feet long, it would be found 
tabulating the data that the most of the guesses were 
tJvely close to 80 feet, and as the gaesses deviated farther 
and farther from 80 their number would grow consecn- 
tively less. It would be found also that the number of 
guesses that exceeded 80 were practically equal to those 
that were less than 80. In the cases just mentioned, and 
others of a similar nature, the mean simply represents a 
center of equilibrium, or center of gravity, as it were, of 
the variations in the given measurements. Having found 
that center of equilibrium — the arithmetical mean — the 
next problem is to ascertain what amount of swing or 
oscillation there may he on either side of this center. These 
are the variations, which correspond to the errors we have 
referred to, in the making of physical measurements. The 
questions arising relative to variability and measures of 
central tendency are treated more at length in subsequent 
chapters. 

The graphic representation of the distribution of data 
in reference to a measure of central tendency as discussed. 
above is known as a normal or probability surface, and Iha 
curve representing it is known as a normal probability' 
curve discussed in a previous chapter. 

Measures differing from the average as discussed above 
were thought of as being in error and this gave rise to the, 
development of what has been called theory of error. 

Considerable space has been given to an exposition of 
the signiiicance of the theory of error in this general 
chapter on educational statistics because a correct idea of 
it is fundamental to the discussion that follows. 

Educational Measurements Compared with Measure- 
ments in Other Fields. — We have thus far discussed the 
question of error where a large number of measurements 
were made of the same thing. We may now assume that 



EDUCATIONAL STATISTICS 265 

there is an analogy between a series of measurements of 
the same thing and a serias of single measarements of 
each of a number of individuals alike in some important 
characteristics. In biology, in particular, this analogy was 
found to work very well hut was less applicable to economic 
data. For this reason there have grown up two schools, 
the one adliering to the doctrine of the theory of error 
and the other rejecting it in the main. In tlie treatment 
of educational data it is found that educational measure- 
ments resemble those of biology in their structure, that is, 
that the theory of error applies. 

As investigations in new fields of science were constantly 
being made and as the data became more complex, the need 
for better statistical methods became more imperative. Re- 
fined statistical inquiries could not be conducted by the 
crude and cumbersome machinery then in operation. As 
a result of this need the development of the pure theory 
of statistics has had a remarkable growth within the last 
two or three decades. Such men as Francis Edgeworth, 
August Meitzen, Francis Gallon, Edward L. Thorndike, 
Karl Pearson, G. Udny Yule, and Charles B. Davenport 
have contributed in the field of biological statistics; and 
Arthur L. Bowley, Jacques Bertillon, R. H. Hooker, 
Tliomas S. Adams, and Warren Persons have each aided 
in establishing statistical methods in the field of economics. 
In dealing with large numbers descriptive of groups in 
any field it is found that special methods become necessary; 
methods that depend on the peculiar properties of large 
nmnhers; methods that are suitable for describing complex 
groups so they can be easily comprehended; methods for 
analyzing the accuracy of statements, for measuring the 
significance of differences, and for comparing one estimate 
L with another. All of these fall within the scope of sta- J 

K tistics. Without the aid of statistical methods, we simply I 

^L have large numbers and groups of numbers from which J 

■ ■_■ 



266 EDUCATIONAL MEASUREMENT 

no logical deductions can be drawn. It must be borne in 
mind, however, that Hatistical methods in themselves prove 
nothing. The methods selected for use in a particolar 
situation must agree with the logic and other non-quan- 
titattve facts of that situation. When thus used, statistical 
methods aid us in refining our thinking about complex 
masses of data and also in refining our methods of expres- 
sion. Bowley says: "The proper function of statistics, in- 
deed, is to enlarge individaal experience."' Because sta- 
tistics measure only the numerical aspects of a phenomenon 
they should be brought into relation to the personal, 
political, ffisthetie, and other non-quantitative considera- 
tions that may be of greater importance in deciding on a 
course of action. 

Quantities Measured Indirectly. — Statisticians must 
very ofteu content themselves with measuring, not the facti 
they wish, but some allied quantity, since it is frequently 
the case that the quantity about which knowledge is desired 
is not capable of numerical measurements. We cannot 
measure health, crime, or poverty, for instance. We can 
measure only death-rate, the number of convictions, or the 
number of persons who receive public relief. Many facts 
in other fields are thus measured. 

Pew important actions can be taken by a modem govern- 
ment or even a modem corporation without a statistical 
study of the conditions of the field in question. Statistical 
results are essential when judgments are to be formed on 
any question which involves numbers, quantities, or values ; 
but they should always be used with discretion and care. 
"The most important function of statistics," says Bowley, 
' ' is evidence to show the relation of one group of phenomena 
to another."* The information obtained is presimiedly 
intended as a guide for action. 



s 



EDUCATIONAL STATISTICS 267 

DeSnition of Statistics. — Statistics has been defined as 
the science of averages. They render the meaning of 
masses of figures clear and comprehensible at a glance. 
They give a bird's-eye view of a situation involving a 
complex aeries with numerous cases in such a way that 
we get a picture of the series as a whole. They have to 
do with movements of groups as a whole. They refer to 
a large mass of facts, or data, that bear upon some human 
problem. This is one of the moat important prmciples 
involved in statistical methods. The individual members 
of a group change rapidly while the whole group changes 
slowly. It js impossible, for instance, to foUow or measure 
the motions of the separate atoms of a body, but com- 
paratively easy to measure the motion and state the laws 
governing the movements of a body as a whole. When we 
wish to obtain a measurement of a group, peculiarities of 
individuals receive little attention. It is only when the 
same peculiarities are possessed by a considerable number 
of persons or things that they become of importance and 
are taken into account. 

Statistics are numerical statements of facts in any 
department of inquiry placed in relation to each other; sta- 
tistical methods are devices for abbreviating and classify- 
ing the statements and maldng clear the relations. Statis- 
tics are aim art always comparative. They show the 
relative importance, the very thing an individual is most 
likely to misjudge. The absolute magnitude of a quantity 
is of little meaning until we have some similar quantity 
with which to compare it. The object of a statistical 
estimate of a complex group is to present an outline, to 
enable the mind to comprehend with a single effort the 
significance of the whole. 

Statistics deal primarily with variable quantities. A' 
variable is a quantity that, under the conditions imposed, 
may assume different values throughout a discussion. In 



J 



268 EDUCATIONAL MEASUREMENT 

edueation, where we make munerous single measuremeiits 
of different pupils grouped together on the basis of some 
common characteristic, each different value constitutes a 
value of the variable.^ 

Laws of Statistical Regularity. — One of the most valu< 
able contributions of modern scientific statistics is that 
it has succeeded in giving us a sufficient picture of a 
group of objects without going through the laborious and 
expensive process of a complete enumeration of all the 
items in the group. Thus, it is by no means necessary, 
in ascertaining the average wage of American working- 
men, to obtain data regarding each man at work. If 
certain typical instances are taken and properly averaged, 
the difference of this average from the true average wage 
of aU workingmen is likely to be such a small quantity 
as to be, for all practical purposes, negligible. 

In a similar way the anthropologist can discover the 
physical characteristics of a tribe or race by taking care- 
ful measurements of only a small minority of the whole. 
This is due to a law of nature formulated on the mathe- 
matical theory of probabilities, that a moderately large 
number of items chosen at random from among a very 
large group are almost sure, on the average, to have the 
characteristics of the larger group. Thus, if two persona 
blindfolded were to pick here and there 300 ears of com 
each from a bin containing 1,000,000 ears, the average 
length of the ears picked by each person would be almost 
identical even though they varied considerably in length. 
It must not be inferred from the above that any number 
of samples, no matter how large, will give exactly the 
same results as would be obtained by the use of the entire 
mass of data. The probability of error diminishes eon- 



EDUCATIONAL STATISTICS 269 

Btantly as the number of items used increases. If, then, 
only a few sample items are used, the chance error is 
likely to be so large as to seriously vitiate the results; 
but as the number of samples chosen grows larger, the 
error diminishes until it eventually becomes negligible. 

Methods of Statistics. — The quantitative study of edu- 
cation reveals two principal methods of treating measure- 
ments of hiunan traits and other educational data. On 
the one hand, tlie observer may note only the presence or 
absence of an attribute or trait. For example, if 98 de- 
grees Fahrenheit is considered a normal temperature for 
an adult human being, and if an individual who has a 
temperature higher than that is said to have a fever, then 
an observer may examine individuals to see whether or 
not this attribute, the fever, is present. The quantitative 
character in this ease arises solely in the counting of the 
number of individuals who possess this attribute. The 
method by which we treat statistics collected in this way 
has been defined by Yule as the statistics of attributes. 

On the other hand, the observer may want to know, not 
only as to the presence or absence of the attribute, but 
how much of the attribute is present. In the case cited 
above, he may want to know how much fever each in- 
dividual has. This method of refining statistics and meas- 
uring the actual magnitude of variable attributes is known 
as the statistics of variailes. This is the method usually 
employed in educational research. For example, we want 
to know, not only whether or not there is retardation in 
our schools, but bow much retardation ; not only how many 
pupils made grades above the passing marks, but how much 
above the passing marks ; not simply that there is elimina- 
tion from school, but how much elimination ; and so on. 

This method implies that the magnitude of the attribute 
or characteristic has been measured with reference to some 
scale made up of units. 



i 



270 EDUCATIONAL MEASUREMENT 



11 

'emdy 1 
entiSc I 
;annot ^ 



Limitations of Statistics. — Statistics, while extremely 
useful to the investigator in almost every line of scientific 
ioquiry, have limitations and shortcomings which cannot 
be overcome. Statistics deal largely with averages and 
these averages may be made up of individual items 
radically different from each other. In the average these 
irregularities are swallowed up. But statistics, from their 
very nature, cannot and never will be able to take into 
account individual eases. The difference between arith- 
metic and statistics is that the former attains exactness 
while the latter deals with estimates. 

Standard of Accuracy. — While in the physical seienees 
very great accuracy of measurements is practicable, this 
is far from being true in the ease of social phenomena. 
In this field a mnltitude of sources of error are ever 
present, many of which can be eliminated by no degree 
of care. Fortunately for the statistician, however, small 
errors are often negligible and in no way obstruct the 
solution of the given problem. Attempts to attain the 
greatest possible degi'ee of accuracy are frequently merely 
waste of time. It might be possible to measure the cus- 
toms revenue of the United States to the nearest cent, 
for instance, but for ordinary purposes cf statistical eom- 
parisoUj such action is not only s^iperfluous but positively 
confusing to the mind, in as much as the addition of extra 
figures directs the attention from the fundamental digits. 

Compensating: vs. Cumulative Rrrors. — The accuracy of 
the final results depends very largely on whether the 
errors are compensating or accumulative. If different 
people were to estimate the length of a given line the 
chances are that as many people would estimate it too 
long as too short. The errors in measuring a line made 
by a pair of chainmen, because of stretching the chain 
too tight or not tabing up the slack sufBciently, would 
tend in the long run to offset one another. In cases of 



d I 

J 



EDUCATIONAL STATISTICS 271 

this kind the errors are said to be compensating. On tte 
other hand, if the chain used by the above-mentioned 
surveyors were too short, the longer the line measured the 
greater the error would be. The last case mentioned shows 
the effects of accumulative errors. 

Discrete and Continuous Series. — Quantities to be 
measured may be in either a discrete or a continuous series. 
A discrete series is one with gaps. It is made up of a 
number of integers, as the number of words in a spelling 
test, or the number of children in a class. There are either 
20 words in a spelling test, or 21, or some other integral 
number. There are never 20-J or 20| words in the test ; 
neither are there 19 -J children in the class. In each case 
the number is an integer. 

A continuous series is one that does not contain gaps 
and is, in theory, capable of any degree of subdivision. 
Most mental traits and social facts belong to this series. 
In actual measurements, a given measure of a continuous 
series does not mean a single point on the scale, but a 
distance along the scale between two limits. For instance, 
when we say that an athlete runs 100 yards in 10^ seconds, 
we do not mean that it was exactly 10^ seconds but that 
it was at least lO-,'^ and less than 10x%- A more delicate 
recording instrument might have recorded the time as 
10^^ seconds, or as 10^^,^, and so on, depending on the 
delicacy of the instrument. 

Undistributed Measures. — The fact that many of our 
marks and measures in education are inde6nite and undis- 
tributed leads to a great many errors in educational sta- 
tistics. A few illustrations will make this point clear. In 
a continuous series the measure zero, which should be a 
definite distance on the scale, a measure .somewhere be- 
tween two limits, is quite indefinite and very confusing 
when used as a point of reference in statistics. Unless the 
statistician defines what he means by zero, correct reckon- 



272 EDUCATIONAL MEASUREMENT 

ing in reference to it is impossible. Thus zero may mean 
from minus 0.5 to plus 0.5 or from to 1 or some other 
measure on the scale. It, many times, means a distance 
on the scale from a point above just not any of the thing 
to be measured to an indefinite distance below. If ten boys 
were given an examination in arithmetic and the test con- 
sisted of ten problems, a boy who failed to solve any of 
the problems would be marked zero, and unless otherwise 
explained this mark would mean anything less than one 
to an unknown lower extreme The boy who solved sis 
problems is scored six; but a score of six unless otherwise 
explained means as much as six and less than seven. These 
points should be carefully noted to prevent confusion in 
finding medians and other measures of central tendency 
and variation in subsequent pages. 

Eules for Tabulating Data.— The first thing that the 
statistical investigator must decide is the exact nature of 
the problem that he desires to solve. The first essential 
is to make the problem definite and clear-cut. The next 
problem is the arrangement of the data in a frequency 
distribution. At first thought it would seem to be one 
of the simplest things in the world to construct a frequency 
table for recording data; but the beginner who attempts 
to tabulate a complex group of figures will quickly discover 
that the simplicity of the operation is far more apparent 
than real. In fact, when a scientific tabulation has once 
been made, it is often found that a large share of the work 
of analysis is completed. 

In beginning a tabulation the first question that 
arises is whether to put the figures in one or in several 
tables. A single table has the merit of completeness, and 
the data are thus brought into proximity. The table, how- 
ever, if too large, becomes confusing to the eye and there 
is great difficulty in following the lines and colum 
a glance. Each table should be a unit. Rarely should 



r EDUCATIONAL STATISTICS 273 

one attempt to demonstrate in the same table several com- 
parisons of different natures. Another matter to decide is 
wliether the table shall show absolute figures, or per- 
centages, or both. The number of separate headings, or 
columns, is a third query which must be answered. The 
more minute the subdivisions, the greater is the accuracy 
obtained. On the other hand, the multiplicity of headings 
prevents the proper emphasis being given to the main facts 
and tendencies shown by the statistics. 

Oeneral Directions for Making a Scale and Curve-Flot- 
ting. — Scales and distribution tables are necessary in 

Igtatisties for two reasons : / 

1. The object of the scale may he to present graphically 
a vivid picture of the general distribution of the facta 
relative to a given problem. One of the most common waya 
of representing these facts so that the eye will catch their 
general trend at a glance is to plot the curve. This is done 
in the following manner : A horizontal straight line is first 
drawn, and points are located at equal distances on this 
line. At the left end of the line a perpendicular is erected, 

»and points are laid off in a similar manner on this line. 
The two series of points are called the scales. It is usual 
to call the point where the perpendicular line intersects 
the horizontal line the origin of coordinate, which is 
designated by O. The horizontal line is usually designated 
by OX and is called the X-axis. The vertical line is 
designated by OT and is called the Y-axia. Distances 
along the X-axis are spoken of as X-distances or X-eoordi- 
nates, and distances along the Y-asis as Y-distances or 
Y-eoordinates. In making a distribution fable and plot- 
ting curves, lines may be drawn parallel to the X-axis 
through the points located on the Y-axis, and lines may 
be similarly drawn through the points on the X-axis 
parallel to the Y-axis, or, the curve may be plotted with- 
out drawing these additional lines. The curve is more 



274 



EDUCATIONAL MEASUREMENT 



easily analyzed, however, if these additional lines are 
drawn. 

The procedure in plotting a cnrve may be illnstrated as 
follows : 

Lay off on the Z-axis distances equal to the magnitude 
o£ the part or trait measured, and, at the respective dis- 
tances representing each magnitude, erect perpendiculars 
to the X-axis. Similarly plot the corresponding traits on 



I I I I 1 1 I I 1 1 I I [ r I 1 X 

FlGUKB VI. 



the Y-axis and erect perpendiculars. The intersectioiu 
of the perpendiculars thus erected constitute the desired 
curve. 

2. The second reason for making scales and distribution 
tables is to arrange the measures so as to facilitate compu- 
tation, 

DeeigTiating the Class Intervals. — Different methods are 
used in designating the class intervals. In some instances 



^' -- 



EDUCATIONAL STATISTICS 275 

the class interval is desipiated by its mid-point. If the 
heights of individuals are under consideration, for instance, 
and the steps are 60, 62, 64, 66, 68, etc., 60 may mean 
from 59 to 61, and 62 may mean from 61 to 63, etc. In 
the Courtis arithmetic tests the size of the step is 1 prob- 
lem. It is not designated by the middle of the step, but 
by its lower limit. For example, 6 problems does not 
mean from 5.5 problems to 6.5 problems, but means from 
6 to 7 problems, and 7 problems means from 7 to 8 prob- 
lems, etc. That is, a pupil would get credit for only 7 
problems even thougii the eighth one were almost com- 
pleted. 

When the limits of the steps are not clearly designated, 
the computation is difficult to follow. It is, therefore, a 
good policy for the beginner to state clearly what his 
class intervals are before he begins the computation. 

In any class interval there may be many measures. The 
measures are spoken of as the frequencies of the class inter- 
vals, and the total frequency is the sum of all the class 
frequencies. 

Analysis of Results. — The results disclosed by a dis- 
tribution table are seldom fully revealed at a glance. Much 
is therefore added to the value of a table if it is accom- 
panied by a written analysis which points out the principal 
conclusions which may be deduced therefrom, the possible 
errors involved, and the probable causes of the phenomena. 
The power to analyze a table, interpret the results cor- 
rectly, and state the conclusions lucidly and succinctly is 
one of the characteristics indispensable in a good statis- 
tician. 

In studying things of the same variety the work may 
usually be facilitated by dividing the items into classes. 
The simplest mode of classification is to group all the in- 
Btanees under two headings, the determining factors being 
whether they do, or do not, possess a given characteristic. 



276 EDUCATIONAL MEASUREMENT 

Tbns we may classify people aa sane or insane, workmen 
as employed or idle, flowers as white or colored, men aa 
short or tall, and so ou. 

For some purposes this division by dichotomy, or cutting 
in two, may be most satisfactory but in many cases the 
difficulty arises that there is no distinct dividing line. 
Thus, jt is impossible to say at just what point a man 
ceases to be short and becomes tall. It is therefore neces- 
sary to lay off arbitrarily a line of demarcation between 
the two classes. But if classes are to be thus arbitrarily 
established, it is often much more advantageous to set up 
a large number of them rather than only two. In practice 
this is usually done by dividing the whole group into 
classes of equal width. Thus, if the tallest trees in a group 
are 39 feet and the shortest 16, and it is desired to divide 
the entire group into five classes, the boundary lines would 
preferably be fixed on the round numbers 15, 20, 25, 30, 
35, and 40, These boundary lines are known as class 
limits, and the distance between the two limits of any 
class is designed as a class interval. In the case cited 
above, 5 would be the class interval. A table formed by 
thus dividing the group into a number of smaller, more 
homogeneous classes, and indicating the number of items 
found in each class is known as a frequency table. The 
number of items falling within the given class constitatsi 
the size of that class or the frequency. 

The Need for Understanding Statistical Formulas. — 
The real scientist must know his tools. He owes it to the 
science and to the persons who may accept his results to 
be quite familiar with his tools. The blind application 
of formulas in statistics has been encouraged by the con- 
venient manuals that are available and by the fact that 
the theory has been surrounded by intricate and involved 
mathematics so that the non-mathematical student had 
great diffienlty iM interpreting them. There is little doubt 



EDUCATIONAL STATISTICS 277 

that the failure of many books on statistical methods to 
set forth the fundamental principles involved in the treat- 
ment of statistical data has done much to hinder the prog- 
ress in the use of statistics in education. The necessary 
mathematics is largely elementary arithmetic, and, with 
a few exceptions, there is no need for higher mathematics. 
Special effort has been made in this volume to present 
the fundamental principles of statistics simply and, as far 
as possible, in non-mathematical language. Practically all 
higher mathematics has been eliminated. 




TQK UBASUBEMENTS OF CENTRAL. TENDENCY OB AVERAOES 



The previous chapters, dealing with the general nature 
and use of statistics, the methods of organizing materials 
in the form of frequency distributions, and the definitions 
and illustrations of .statistical terms, have prepared us 
to talte up the various methods of statistically treating 
the distributions thus organized. It will soon become clear 
that the organization of material into frequency distri- 
butions is but a preliminary step to a more concise descrip- 
tion and representation of it by analyzing the size, number, 
and position of the various items that make up the dis- 
tribution. 

Many things about the nature of the distribution must 
be known before comparisons with other distributions may 
be scientifically made. The limitations of the various 
averages must be fully recognized. The ajnount of dis- 
persion about the averages must be known, and the reli- 
ability of the measures determined. 

It has been indicated iu a previous chapter that we are 
primarily interested in comparative values rather than in 
absolute values. In fact, a thing cannot be evaluated until 
it is compared with something else. Statistical data can- 
not be compared until they are expressed in terms that 
properly represent the entire mass. There are three prin- 
cipal ways of describing statistical data in the form of 
frequency distributions, that is, there are three general 
types of measurements of statistical data. They are: (1) 
measurements of central tendency, or averages; (2) 



f 



MEASUREMENTS OF AVERAGES 279 

measurements of dispersion or variability; and (3) 
measurements of correlation. The first of these will be 
discussed in the present chapter and the other two in the 
two succeeding chapters. 

Averages. — The word average has a very indefinite 
meaning in common parlance. The public uses the term 
very loosely. Generally speaking, the word average as 
used by the public may mean one of two things : It may 
mean the most frequent measure in tbe group, "the gen- 
eral run," the typical measure, as the average clerk, the 
average teacher, the average size city, the average Amer- 
ican; or, it may mean a different thing altogether, illus- 
trated by what the farmer has in mind when he says that 
his hogs averaged 210 pounds, or his wheat averaged 14.7 
bushels to the acre. 

In the second interpretation of the average, it may be 
that not a single hog in the drove actually weighted 210 
pounds, nor a single acre actually yielded 14.7 bushels of 
wheat; hence this average is very different from the other 
which is the typical measure in the series. 

The first average illustrated above is more speeifieally 
known as the mode, and the latter as the arithmetic mean, 
or simply the mean. A discussion of the characteristics 
and limitations of these averages and also those of another 
average called the median will be taken up at some length. 

The Arithmetic Mean. — The arithmetic mean may be 
defined as the sum of all tTie measures in a distribution 
divided by the number of measures. It is represented 
by the formula 

af = ^ 

where M represents the arithmetic mean of the distribu- 
tion, S indicates that the products of fm are to be added, 
m =the value of any measure, f = the number, or fre- 



EDUCATIONAL MEASUREMENT 



qaency, of the measure of a given value, and iV the total 
number of measures. Table XVII illustrates tie simple 
compntation of the arithmetic mean. 



Table XVII. — Grades in Per Cent Made i 
First-Ybab Alqbbra 
{Hypothetical Case) 



Tes Stdbentb r 



Pupa 


Grade 


Frequency 


FrequenCTX MeasuicB 


A 


84 




84 


B 


91 




91 


C 






68 


D 






79 


E 


95 




95 


P 


93 




93 


G 


87 




87 




85 






I 


90 




90 ^M 


J 






10)864 -^M-^B 



In the treatment of statistical data the distributions are 
rarely so simple as in the above illustration. Instead of 
having one student make a grade of 84, another 91, and 
so on, it would be more probable to have several students 
making grades of 84, 91, etc., ranging all the way from 
about 60 per cent up to 100 per cent. 

Table XVIII is more nearly representative of the way 
data are found and treated statistically. 

Table XVIII illustrates a distribution where each Bcore 
is made by more than one pnpil; that is, a score of 4 is 
made by three pupils, a score of 5 is made by four pupils, 
a score of 6 by 21 pupils and so on. Since there is more 
than one person making each of the various scores, the 
distribution is said to be weigJited, and the arithmetic 
mean found from such a distribution is called a weighted 
arithmetic mean. 



MEASUBEMENTS OF AVERAGES 



281 



Table XVIII. — ^Distribution of Scores on 614 Samples of Pen- 
manship Made by Children in the Third Grade in the Salt 
Lake City Public Schools ^ 



Score 


Frequency 


SooreX Frequency 


m 


/ 


Sm 


4 


3 


12 


5 


4 


20 


6 


21 


126 


7 


65 


385 


8 


85 


680 


9 


196 


1,764 


10 


46 


460 


11 


102 


1,122 


12 


44 


528 


13 


39 


507 


14 


11 


154 


15 


4 


60 


16 


4 


64 

614)5,882 

9.58 mean 



It should be noted that the true mean is found in this 
case the same as in Table XVII. The principles under- 
lying the computation of the simple and weighted means 
are the same. In each case the value of each measure, or 
score, is multiplied by its frequency; the products are 
added ; and their sum is divided by the number of measures. 

The formula for the weighted arithmetic mean is 



M = 



7:fm 

N 



where M equals the arithmetic mean, m the numerical value 
of any measure, / the corresponding frequency of 
occurrence, S the sum of /m's and N the total number of 
measures. 



^ E. P. Cubberley, School Organization and Adminiatrationf p. 154. 



282 EDUCATIONAL MEASUREMENT 

In Tables XVn and XVIII, the data are said to be 

angrouped and the exact value of each score is recorded ; 
that is, there were three pupils who made a score of 4 
(Table XVIII), four who made a score of 5, and so on. 
It so happens that the range on the Thomdike Handwrit- 
ing Scale is from Quality 4 to Quality 18 inclusive, making 
a total range of 15. If, however, the range were extended, 
or the samples graded on a percentage basis, the grades 
might range from per cent to 100 per cent. In that cajre 
the distribution might have 100 different scores instead 
of 13, as shown in Table XVIII. 

Making a distribution table with from 80 to 100 different 
scores in it would involve a great amonnt of Habor. In 
order to facilitate matters and lessen the volume of labor, 
the data arc grouped into class intervals. The number 
of class intervals should rarely exceed 20 and a less num- 
ber is, many times, desirable. The only reason for group- 
ing the data into class intervals is to lessen the labor in 
computing the arithmetic mean. By this method, how- 
ever, accuracy is sacrificed somewhat to save labor; but 
the difference between the true arithmetic mean with the 
data uugrouped and the mean found by this method (that 
is, with the data grouped) is so email that it is generally 
negligible. 

In computing the arithmetic mean two assumptions must 
be made when the data are grouped in class intervals: (1) 
that the measures are distributed uniformly throughout 
the class interval; and (2) that for purposes of computa- 
tion the measures in any class interval may he numerically 
represented by the mid-point of the class intervaL 

Two methods may be employed in the solution of the 
arithmetic mean with grouped data. Table XIX repre- 
sents the data in Table XVIII grouped in class intervals 
and computed by the traditional, or long method. 



^ J 



MEASUREMENTS OF AVERAGES 



Table XIX. — Distbibotion or ScoREa on 614 Samples of Penman- 
BHIP Made by CHtLDfiEN in the Third Grade in the Saui 
Lakje Cm Public Schools to iLLnsTBATR the Compctation 
OF THE Mean with Data Grocped in Class Intebvals 



Claaa 
Intervals 


Mid-point 
Clasa Intervals 


Preqiieney 
f 


MeaaureaXTheirCorrespond- 
ing Frequency 


16-17.99 
14.^15,99 
13-13,99 
10-11,99 
8-9.99 
ft- 7.99 
4- 5.99 


17 
15 
13 
11 
9 
7 
5 


4 
15 
83 
148 
281 
76 
7 


68 
225 
1,079 1 
1,628 

2529 ■ 
632 
36 




614 


9.92 weighted aritkmelic 
mean 



The true ariikmetic mean, as shown in Table XVIII, ia 
9.58, while the arithmetic mean computed with grouped 
data is 9.92, maldog a difference of 0.34 of a score. 

Computation of Arithmetic Mean by Short Method. — 
Table XX illustrates the computation of the arithmetic 
mean by the short method ; the data are taken from Table 
XVIII. 

The usual method in arranging the frequency distribu- 
tions ia to begin with the highest scores and work in the 
direction of the lower ones, that is, in Table XX, we place 
the scores from 16 to 17.99 at the top of the distribution. 
This is not absolutely necessary, but it is more convenient 
and less liable to errors. 

When the data are grouped in the class intervals, as 
illustrated in Table XX, we may then take the mid-point 
of any class interval as the assumed mean. It is best to 
take the class interval that contains the tme mean although 
this ia not necessary. The point 9, the mid-point of the 
interval 8 to 9.99, was chosen as the assumed mean. The 



EDUCATIONAL MEASUREMENT 
Tablb XX 



Oam 


Frequency 
/ 


DeriatioD from 

the Aseumed 

Mean Interval 

d 


FrequemcyX 
Deviation 

fd 


16-17 
1H5 
12-13 
10-11 

8- 9 
6- 7 
4- 6 


99 
09 
99 
99 
99 
99 
99 


4 
15 
83 
148 
281 
76 
7 


+4 
+ 3 
+2 
+1 

-1 
-2 


16 
45 
166 
148 

375 

-76 

-14 - 90 






014 


+285 



Truerr. 



. 9.92 



mid-point of the class interval immediately above is 11 
and is said to have a deviation (,d) from the assumed 
mean of + 1- The second class interval above has a 
deviation of -|- 2 and so on. The mid-point of the class 
interval immediately below the assumed mean has a devia- 
tion of — 1 ; the second one a deviation of — 2, and so 
on. We next multiply the deviations (d) by the frequency 
(/) just as we did in the long method. This gives tib 
the fd column in tlie table. It is evident that, if the 
assumed mean occupied the same position as the true mean, 
the sum of the deviations above it would equal the sum 
of the deviation below, but since the true and the assumed 
means are not the same, the measures above will not equal 
those below. Of course, it is possible that the assumed 
mean mipht equal the true mean, in which case there would 
be no corrpftioD ; but this would rarely happen. In most 
Cfises, therefore, a correction (c) must he added to the 
assumed mean to get tlie true mean. This correction it 



MEASUREMENTS OF AVERAGES 286 

the algebraic sum of the /d's divided by the total number 
of cases in the distribution. It is expressed by the formula 

S/d 
c= 

N 

in which c equals the correction to be added, Z fd equals 
the algebraic sum of the frequencies (/) multiplied by their 
respective deviations (d), and N is the number of cases in 
the entire distribution. 

The average amount of deviations from the assumed 
mean, taken in the right direction (that is, added alge- 
braically) evidently would give us the true mean. It is 
evident that, if the sum of the plus fd's is greater than the 
sum of the minus fd% the true mean deviates from the 
assumed mean in the direction of the positive fd% or the 
true mean is greater than the assumed mean by an amount 
equal to the correction c. If, however, the sum of the 
negative /d's is greater than the sum of the positive /d's, 
then the correction must be subtracted, and the true mean 
is less than the assumed mean. 

In Table XX the positive /d's exceed the negative fd^s 
by 285. This divided by the total number of cases (614) 
gives 0.46 of a class interval that each of the measures is 
in error, when the mean is considered to be at 9, the mid- 
point of the class interval. Since the width of the class 
interval is 2, we multiply the average error 0.46 by 2, 
giving 0.92 of an actual unit that must be added to the 
assumed mean. Thus we have : 9 -f- 0.92 = 9.92 the true 
mean. 

It should be noted that the short method is short only 
when the range is long and, therefore, many class intervals 
are necessary. If the range is short, there is nothing 
gained by using the short method. Following is the sum- 
mary of steps used in this method: 



EDUCATIONAL MEASUREMENT 



Sdmmaby op Steps in the Computation of the Arithmetic 
Mkah By the Short Method 

1. Group the measures in a frequency distribution table. Ar- 
range the data in the table in four culumns. The first column 
at the left contains the class intervab arranged with the highest 
scores at the top of the table; the second column eoatains tlie 
frequencies (/) ; the third column contains the deviations from 
the assumed mean {d) ; the fourth column contains the products 
of the frequencies times the deviations (fd's): 

2. Find the total of the frequencies in the second column. 

3. By inspection estimate the class interval that contains the 
mean and take as the assumed mean the mid-point of this interval. 
(The mid-point of any class interval may be taken as the assumed 
mean; but it is better to choose the class interval containing the 
true mean or one near it.) 

4. Consider each class interval as a unit and record in the third 
column the number of units that the mid-point of each class 
interval deviates from the assumed mean; the first one above 
the assumed mean having a deviation of + 1; the second one a 
deviation of ~j- 2, and bo on. Deviations below the assumed mean 
are treated in the same way, but considered negatively. 

5. Multiply each deviation (d) by its corresponding frequency 
(/) observing the algebraic signs, and record tlie product in 
column 4. 

6. Find the algebraic sum of the fd's in column 4. 

7. Divide this sum (which is the difference between the positive 
and negative fd's) by the total number of measures {N) which 
Ja the sum of the frequencies in column 2. This gives the arith- 
metic mean of the deviatiouE from tlie assumed mean in terms 
of class intervals, 

8. Multiply the deviation in terms of class intervals by fie 
number of units in the class' interval. 

9. Add {algebraically) the product obtained in 8 to the as- 
sumed mean to get the true mean. 

The advantages and disadvantages of the different forms 
of average will be taken up after the mode and median 
have been discussed. 

The Mode. — The mode is that item or term that is most 
characteristic or frequent in a distribution. It is the value 



f MEASUREMENTS OF AVERAGES 287 

which is the fashion {la mode). It represents the typical 
fact. In Table XVII, page 280, there is no mode because 
one item or grade oecurs as often as another. In Table 
XVIII, score 9 represents the mode because that score 
occurs more frequently than any other. 
A mode may be defined as that measure of a variable 
fact which appears more frequently than measures directly 
above or below it. Distributions may, therefore, be 
unimodal or multimodal. A symmetrical distribation is 
unimodal because there is only one place in the distribu- 
tion where the measures are of greater frequency than 
I those directly above or below it. 
The value of the mode as a measure of central tendency 
over the average may be illustrated by the following 
example : Suppose one were told that the average wealth 
of ten farmers living on a certain highway was over $100,- 
000 each. It might so happen that one of these farmers 

I was worth $1,000,000 and the other nine were worth $1,000 
each. The average or mean used in this case is misleading. 
The mode would be a much better average to use. In this 
case the mode would be $1,000, which represents the group 
better than the mean which would be more than $100,000. 
The mode has little statistical value other than an in- 
speetional average and will, therefore, be discussed very 
briefly. 

The Median, — The fact that the median has not been 

t rigorously defined, or, if thus defined, the definition has 
not been generally accepted, has led to considerable con- 
fusion iu its computation. 

Rugg says that the median is "defined rigorously as I 

that point on the scale of the frequency distribution on 
each side of which one half of the measures fall."* The j 

^ measures, of course, must be arranged according to their J 

^t * Harold O. Rugg, Stalistieal MeUvods Applied to Edvcalion, p. 104. ■ 

m i— J 



288 EDUCATIOXAL MEASUREMENT 

ascending or descending valnes. While Rngg calls this a 
rigoroos definition, it, nevertheless, has admitted of many 
solottons because of different interpretations of the limits 
of class intervals and the lack of nniformity in the dis- 
tribution of the cases over them. While theoretically there 
would be no class interyals near the center of the dis- 
tribution that contains no measures, yet actually this hap- 
pens many times in practice when tfae number of measures 
in the distribution is small. 

In practice we have not been rigorously consistent in 
the scoring of test papers and the computation of the 
median from the scores thus obtained. Teachers and 
students have had considerable trouble in finding medians, 
because in actual practice a half dozen methods are being 
employed, no two of which will give the same result. In 
order to help clarify the procedure and point out the basic 
principles and assumptions made, we shall discuss the com- 
putation of the median somewhat in detail. Before doing 
this, however, we shall give some additional definitions 
of the median in order to have it clearly in mind as the 
discussion proceeds. 

Thomdike defines the median thtis:' "The median, or 
50 percentile or mid-measure is the place on the scale 
reached by counting half the measures, in the order of 
their magnitude, or the place on the scale above and below 
which are equal numbers of the measures." This defini- 
tion and the one given by Rugg make no provision for the 
computation of the median when the number of measures 
in the distribution is even and there is no middle measure. 
As a consequence, statisticians have used a variety of 
methods in computing medians with distributions of this 
kind. 

Seerist defines the median thus;* "The median of a 



MEASUREMENTS OF AVERAGES 289 

series is that item — ^actual or estimated — in a series, when 
arranged consecutively, which divides the distribution into 
equal parts. When the number of items is even, it is 
half way between the two middle terms ; when the number 
is odd, it is the middle term." This definition is the same 
as that used by McCall who defines the median thus:* 
''When measures are arranged in order of size, the median 
is the middle measure or (lacking a middle measure) mid- 
way between the two middlemost measures." A median 
thus defined is perfectly definite and admits of but one 
interpretation provided there is uniformity in the treat- 
ment of the measures in the class intervals. We shall now 
note wherein the methods have differed in the computation 
of the median. 

1. The Spread of the Score Interval Commonly Used in 
Statistics and School Practice. — ^When a pupil takes a test 
in addition in arithmetic, for instance, what are the limits 
of the various scores? That is, does a score of 2 mean 
that the pupil has done a definite amount of work, just 
barely finished two problems, and no more, or does it mean 
something else? What do five problems mean? The cus- 
tom is to give no credit unless the pupil completes at least 
one problem. If he does less then one problem, he is given 
a grade of zero. Therefore, zero means, in actual practice, 
anything between just not any of the thing in question 
and 1. One problem means any amount between 1 and 2, 
but not exactly 1 and no more. Five problems means 
any amount between 5 and 6, and so on. We thus use the 
lower limit of the step, or score interval, in recording 
grades in arithmetic and in most of the other subjects in 
the curriculum. In finding the median score in most of 
the school achievement tests, however, the custom has been 
to use the middle of the step in computing the median and 

* William A. McCall, "How to Compute the Median," Teackers 
College Record, Vol. 21, March, 1920, p. 126. 



290 EDUCATIONAL MEASUREMENT 

tlie lower limit of the score interval in aeoring the paper* 
and recording the scores. 

Prom the standpoint of statistics, the middle of the score 
interval is the best measure to use. But the score interval 
generally used in statistics is not the same as the one used 
in actual practice in scoring test papers. From 1 to 2, 
2 to 3, etc., are the score intervals used in scoring papers. 
But in the computation of the median the common practice 
is to employ score intervals with their limitg at 0.5, 1,5, 
2.5, etc. Statistical treatment is not consistent and does 
not conform to the practice in at least three respects; (1) 
Statistical methods use the mid-point of the score interval 
in computing the median but use the lower limit of the 
score interval in scoring the papers. (2) In statistics one 
score interval is used in computing the median, and an- 
other is used in scoring the papers and recording the 
grades. (3) In statistics the practice has been to use a 
different score interval in computing medians from that 
used by teachers in scoring test papora and recording the 
grades. In the following pages we shall note how statistical 
methods may be made more consistent and at the same time 
conform to the methods used by teachers in scoring papers. 

2. What Formula Shall We Use in Computing a 
Median? — Some authors recommend the use of the formula 
N +1 

in finding the median, and others recommend the 

2 

N 
formula — Both formulas will not give the same result 

2 
in all cases. The question then arises as to which formula 
should be used, and if both are to be employed, when shall 
the former be used, and when shall we use the latter? 
It seems to the author that there is no very good reason 

iV +1 
for using the formula — — — since the same point on the 



^ J 



MEASUREMENTS OF AVERAGES 291 

scale is not reached in computing the median if we employ 
this formula and count in from each end of the series. 
Pnrthermore, if we make the score intervals consistent 

JV +1 
with our logic in scoring papers, then the formula 

eannot be employed at all. It is sometimes argued that 
if you want the ordinal number of the median you should 

^^ +1 

use the formula _ but if you want the median point 

2 

N 
on the scfde, the formula — should be used. In answer to 
2 

this argument it may be shown that both the ordinal num- 
ber and the median point on the scale may be found by 
the latter formula. We shall now illustrate the various 
eases in the computation of the median. 

Computation of the Median, Simple Distribution. — 
Case 1. The number of items in the distribution is odd. 



Tabm XXI,— Scokbs Made b 

THE CoimTIH AbIT 


T TaiRTEEN SntTH-GsADB Pdpile 
EMETIC Tests, Sebies B 


IN 


Puptt 


c 


u 


L 


A 


D 


' 


T 




H 


, 


K 


G 




Score 


16 


15 


14 


13 


12 


11 


10 


9 


8 


7 


6 


5 


4 



Since the median is the measure in the middle score 
interval, it may be located by dividing the score intei'vals 
by 2, (The score interval here is the same as the class 
interval since there is only one score in a class interval.) 

2 2 " 

intervals conform to practice in scoring papers and assume 
that they are from 4 to 5; 5 to 6, etc. Starting with the 
score interval 4-5 and counting 6.5 score intervals locates 
the median at 10.5. Counting down from the other end 
of the series we reach the same point. This is both the 



6.5. In our reasoning let us make the score 



^ 



292 



EDUCATIONAL MEASUREMENT 



mid-point of the middle meamire (a measure is assumed 
to be distribnted over a score interval) and the mid-point 
on the scale. The middle score ia the score made by pupil 
p. Its mid-point is halfway between 10 and 11 or 10.5. 
Therefore 10.5 is the median. The enstomary way of 
computing ttie median is to take the score intervals one- 
half interval lower than these and make 10 the median 
score insfead of 10.5, This, however, is inconsistent with 
the method of scoring the papers. 

Case 2. The number of items or scores is even. — ^Let oa 
eliminate pupil o with a score of 4 from the preceding 
distribution and compute the median with an even number 
of cases. Table XXI now becomes Table XXII. 



Pupa 


c 


M 


t 


A 


p 


„ 


F 


J 


H 


1 


E B 


Score 


1. 


15 


14 


13 


12 


11 


10 


g 


8 


7 


6 S 



It is evident that the median score cannot be a middle 
score in this series since there is no middle score. Apply- 



ing the formula used in Case 1 we have 



N 12 



= 6. Since 



there is no middle score, some provision must be made to 
satisfy a case of this kind. Since there are 12 score inter- 
vals, it ia evident that if we took the junction point of 
the 6th and 7th score interval we would have the mid- 
point on the scale. Starting with the 5-6 interval ajid 
counting in six intervals we have 11 as the mid-point on 
the scale. This value is obtained by counting in from 
cither end of the distribution. The usual method of rep- 
resenting a score interval in statistics ia by its mid-point. 
The mid-point of the 10-11 score interval is 10.5 and the 
mid-point of the 11—12 score interval is 11.5. If we take 



MEASUREMENTS OF AVERAGES 293 

half of these two middlemost valnesy we get 11 as the 
median score which conforms to the definition given by 
McCall and Secrist. This method is logical and conforms 
to the way scores are computed from test papers* If, as 
has been the custom, the score intervals are taken at 6.5 
to 7.5 ; 7.5 to 8.5 ; etc., then the median score would be 10.5 
instead of 11. This method, however, is inconsistent, as 
was pointed out above. 

When the Distribution Is Oomplez. — Case 3. Where 
more than one pupil make the same score and the data 
are grouped in class intervals. — The distribution of edu- 
cational data is rarely so simple as those given in Tables 
XXI and XXII. Instead of a score being made by just 
one pupil it is usually made by more than one, sometimes 
by hundreds of pupils. This necessitates grouping the 
data in class intervals in order to condense the distribu- 
tion table within workable limits. It also involves the 
distribution of the items within the class intervals. 

The question of discrete and continuous series discussed 
in Chapter IX should be reviewed in order to have clearly 
in mind the type of data under consideration. Most 
measurements in education belong to a continuous series 
or may be treated as continuous even though discrete. 

Having decided the question as to whether the data are 
discrete or continuous, the next important question to de- 
cide is the distribution of the items in the various class 
intervals. Suppose we had a group of 50 children who 
made a score of 7 in arithmetic on the Courtis arithmetic 
test. The chances are that some of the pupils had the 
eighth problem almost finished, when the signal to stop 
was given. Others had it three-fourths finished, some had 
it half finished, some one-fourth finished, and a few had 
just barely finished the seventh problem when time was 
called. In other words, instead of the 50 pupils just barely 
finishing the seventh problem the instant that time was 



294 



EDUCATIONAL MEASUREMENT 



called, the probabilities are that they were djstribnted 
about equally over the interval 7 to 7.9999 +. Instead 
of having 50 pupils make a score of 7, let us simplify the 
problem and take a distribution where four pupils make 
a score of 7 and the median falls within the 7-8 interval. 









TiBiJiXXlU 
















Pupil 


c 


u 


L 


A O 


H 


F 


, 


H 


I 


K 


K 


o 


Score 


9 


9 


9 


8 8 


7 


' 


7 


7 


6 


5 


5 


4 



The niunber of scores is 13. The median score is the 
seventh score counting in from either end of the series. 
The score made by pupil p is therefore the median score. 
But there are four pupils who made a score of 7. The 
best guess we can make as to the way those four scores 
are distributed in the class interval 7-7.9999 -j- is to say 
that they are distributed equally over the class interval 

The following illustration will make it clear what is 
meant by being distributed equally over the class intervaL 
The table shows that pupil o had solved four problems; 
pupil E, five; pupil K, five; pupil I, six; pupils H, j, p, 
and B, seven, and so on, when the examiner gave the signal 
to stop. It does not mean, however, that just the instant 
the signal to stop was given that pupil G had just barely 
finished fonr problems and had not started on the fifth 
one, or that pupils e and k had just barely finished five 
problems and had done no work on the sixth, and so on. 
These scores indicate the number of problems actually com- 
pleted and eacli of the 13 pupils may or may not have 
attempted more problems than those actually completed. 
Since the interval 7 to 7.9999 + is the interval that con- 
tains the median, the problem is to find out what is the 
most probable amount of work done by the seventh pupil 
counting in from either end of the series. Let us represent 



MEASUREMENTS OF AVERAGES 295 

graphically the 7-8 interval We know that four pupils 
completed seven problems ; but we do not know how much 
more they did. Now, our best guess would be that the 
actual amount of work done by these four pupils is dis- 
tributed somewhat as follows: 

Let us represent the class interval 7.00 to 7.9999 + - 8 
by Figure VII and let the line AB represent the distance 
through this step. When the signal to stop was given, 
the first of these four students (student 
h) had completed 7 problems and had Figure VII 
done work on the eighth one ranging in b 7.9999— —8 
amount somewhere between per cent b -87.5 
and 25 per cent; j, the second student, E — 7.75 
had the eighth problem from 25 per p -62,5 
cent to 50 per cent completed; f, the D — 7.50 
third pupil, had completed from 50 per j -37.5 
cent to 75 per cent of the eighth prob- C — 7.25 
lem, and B, the fourth pupil, had com- h -13.5 
pleted from 75 per cent to 100 per A 7,00 
cent. This would be a safer guess than 
to guess tiiat all four of these students had juat barely 
completed the seven problems when the examiner gave the 
signal to stop. Still another guess is necessary. Assum- 
ing that we know that the first pupil in the 7-8 interval 
had done work on the eighth problem ranging in amount 
somewhere between per cent and 25 per cent, and that 

B wanted to make the best estimate we could make, tak- 
ing one ease with another, as to how much he actually 
did, we would estimate that be had gone halfway through 
this interval per cent to 25 per cent, or, that he had 
done 12.5 per cent of the eighth problem when the signal 
to stop was given. Keasoning the same way for the other 
score intervals, the second pupil (j) would have had 37,5 
per cent of the eighth problem completed; the third (f) 
62,5 per cent, and the fourth one (b) 87,5 per cent. 



296 EDUCATIONAL MEASUREMENT 

Now, since we have the most reasonable distribution of 
these cases over the class interval, we are ready to con- 
tinue the work in computiiig the median. Starting at the 
lower end ol the distribution wc have four cases up to the 
7-8 interval. Since the median score is the score made 
by the seventh pupil, and since there are 13 score intervals 
in the entire aeries, therefore, counting in from either end 

\r 
of the series — score intei-val, we would have half the 

2 
distance through the series and also have the mid-point 
of the seventh score interval. We have four eases up to the 
7-8 class interval and must have 2.5 of the 4 score intervals 
in the 7-8 class interval. Noting Figure VII, we see that 
2.5 score intervals would give us a point 62.5 per cent 
through the 7-8 class interval, or a median point on the 
scale of 7.625. But 7.625 is also the mid-point of the 
seventh score interval. Therefore the median is 7.625. 

Table XXIV illustrates the computation of the median 
with the data grouped in class intervals, each of which 
contains five units instead of one, and with some of the 
class intervals containing a large number of cases. 

Table XXIV.—Djstbibdtion op Mark.s Given in Engush to 263 
HiGH-ScaooL Pupils 
Ckss Interval Number ot Pupila 

95.0-100.00 20 

90.0-94,99 63 (83) Adding down 

S5.0- S9-99 38 (X21) 

80.0- 84.99 47 

75.0- 79,09 38 (95) 

70.0- 74.99 33 (67) 

65.0- 69.99 X6 (24) 

60.0- 64.99 2 (8) 

66.0- 59.99 3 (6) 

60.0- 54.99 1 (3) 

45.0- 49,99 1 (2) Adding up 

40.0- 44,99 1 

W=263 

Applying formula, -2=-2- = 131-5. 



MEASUREMENTS OF AVERAGES 297 

Since there are 263 cases, 131.5 score intervals from 
either end of the series will be the mid-point on the scale 
and also the mid-point of the middlemost measure. 

Adding up from the bottom, we have 95 cases up to the 
80-84.99 class interval. This class interval contains 47 
cases, or 47 score intervals, and we must take 36.5 of these 
score intervals to give us the median point on the scale, 
and the mid-point of the middlemost measure. The same 
result is obtained by counting down from the top. There 
are 121 scores down to the upper margin of the class 
interval 80.0-84.99. Therefore, we must take 10.5 scores 

from the 47 in that interval or go -^ the distance through 

the class interval coming in from the top. Since the num- 

10 5 
ber of units in the interval is five, -^ of 5 subtracted 

from 85, the upper limit of the interval, equals 83.88, which 
is the median. 

Case 4. Wfcere i'he median falls in the 100 or the zero 
class interval. 

Table XXV. — Distribution of thb Marks Given to a Freshman 

Class in Algebra 

Class Interval Frequency 

100 13 

90^99. W 1 

80-89.99 2 

70-79.99 1 

60-09.99 1 

60-59.99 2 

40-49.99 1 

30-39.99 1 

20-29.99 1 

23 

N 23 
Substituting in the formula, — = — = 11.5. Counting 

from either end of the series we find that the median lies 
in the 100 interval The cases in this interval are undis- 



298 EDUCATIONAL MEASUREMENT 

tribated and are considered to lie at a point ; therefore 
there is no correction, and the median is 100. 

In the solution of problems no credit is osoaUy given 
unless the student !*olves one or more problems correctly. 
In a class of 25, it might be that 13 pupils would fail to 
complete any problems. In this case the median would 
depend upon whether the cases in the zero-interval, that is, 
from to 0.999 +, were distributed equally over the inter- 
val or whether they were considered to be piled up at its 
lower limit. From the reasoning in the former cases it 
would be more logical to consider them distributed over 

the interval and take — r— r of the distance through the 
12.5 

interval as the median measure. Therefore, tlie median is 

0.96. 

Cabe 5, When the partial sum is the half sum arid there 

is no correction, — ^Another case that proves troublesome 

is illustrated in Table XXVI, taken from Monroe." 

Tabu: XXVI 
Scale Frequency 



Total 

Approrimatfl median. . 

Correction 

True median 



* Measuring the Reaulls t^f Teaehing, p. 106. 



MEASUREMENTS OF AVERAGES 299 

In explaining the median in this case, Monroe saya: 
"Case A is where the partial sum (13) is also the half 
sum. The approximate median is in the next interval (9). 
Since the difference between the partial sum and the half 
Bum is zero, there is no correction and the true median 
is 9.0." 

This practice does not conform to the theory discussed 
ahove. Neither is it consistent with the instructions that 
Monroe gives for finding the median in his reading 
tests. 

Reasoning as we did from Table XXIII, we note that 
since the interval 8-9 contains four measures which are 
theoretically distributed equally over the interval, the 
thirteenth measui'e would lie in the class interval some- 
where between 8.75 and 8.9999 and our best guess is that 
it lies at the mid-point of that interval or at 0.875 of the 
distance through the clas.s interval, which would make the 
median 8.875 instead of 9. 

Case 6. Where measures are discrete. — Table XXVII 
illustrates the computation of the median with measures 
discrete. 



! THE Median, 



4 



Total, . 



Since there are 117 measures the 59th measure is the 
median measure. Counting np from the bottom we find 
that the 59th class lies somewhere among the classes con- 
taining 13 pupils. Now since it is, of course, impossible 



300 EDUCATIONAL MEASUREMENT 

for a class to contain a fraction of a pupil, the median 
is 13. 

Case 7. Where the median falls within a class interval 
containing no cases. 

Table XXVIII 

' Scale Frequency 

11 8 

10 10 

9 20 

8 

7 10 

6 13 

6 10 

4 5 

Total 76 

N 

— equals 38. There being an equal number of cases, 

the median is mid-point between the two middle cases. 
But there is one class interval between the two middle- 
most cases that contains no measures. Since the number 
of cases is even, the median point on the scale would 
ordinarily be the junction point of the two middlemost 
score intervals, but, since the two middlemost score inter- 
vals are not contiguous, we add half the intervening class 
interval to the score interval above and the other half 
to the score interval below and locate the median at the 
mid-point of the gap between the two middlemost score 
intervals. Halfway through the 8-9 interval, therefore, or 
8.5, is the median. 

A somewhat lengthy discussion has been given on the 
computation of the median, because the median is one of 
the chief measures of central tendency, and also because 
the methods are not uniform. We shall now give a sum- 
mary statement of the steps for the computation of the 
median. 



1 

1 



MEASUREMENTS OF AVERAGES 301 



SuMiiARr OF Steps in the Compdtation op the Mediait 

1. Arrange the data in a frequency distribution taking special 
oare to note the limits of the class intervals and also the limits 
of score intervale. 

2. Find oue-hal£ the Bum of the measures. 

3. Beginning at either end of the distribution, preferably the 
lower end, count the number of measures included in all class 
intervals up to the interval containing the median, 

4. Subtract this number from the half sum of all the measures 
computed in step 2. The difference is the number of measures 
that must be taken from the next interval to bring the com- 
putation up to the median point on the scale. 

5. Divide this remainder by tbe number of cases in the class 
interval containing the median and multiply tbe quotient fay tbe 
number of units iu the class interval, 

6. Add this number to the value of the lower limit of the 
class interval containing the median, if computation is made from 
the lower end of the distribution, and subtract it from tbe upper 
limit of this class interval, if computation was made from the 
npper end of tbe distribution. This is tbe median point on tbe 
scale. 

If, in finding the median, no other measures are desired, 
care need be taken only in the arrangement of the eases 
near the median value. Consequently the median cannot 
give detailed information of the measures at the extremities 
of the ranges. On the other hand, the median is a fairly 
stable measure and changes very slowly when different 
samples are taken, which means that it is not greatly 
affected by the presence of accidental and irrelevant in- 
fluences. 

Comparison of the Arithmetic Mean, Mode, and Median. 

-We are now in a position to compare the three kinds of 
averages with a view of determining their more salient 
ciiaracteristics. Following is the comparison of the aritli- 
metie mean, the mode, and the median : 



1 



J 



EDUCATIONAL MEASUREMENT 



Ahithuetic Mean 

1. The arithmetic 

coQBideratioD all 
cases and ia affected 
by their aiie. 

2. The arithmetic 
mean is affected by 
every item in the 



Mode Medun 

1, The mode deals 1. The median takes 
with only the moet inUi consideration 

representative all the cases: but 
the Giie of extreme 
cases does not af- 
fect it, 
2. The median is a 
counting average 
and is affected by 
the number of cases, 
but not by the eiEB 
of the extreme 



2. The mode ia deter- 
mined by the moet 
frequent 



I. The measures must 3. The 



g to their n 



be arranged accord- 
m£ to their magni- 
tude. 



3. The mean may be 
found without ar- 
ninging the meas- 
ures according to 
theii magnitude, 

4. The mean may be 4. The mode may be 4. The sum of the 
determined when located without the measures ajid their 
the aggregate and number of cases or number do not fui^ 
the number of cases the extreme caaes. nish sufficientdata 
are known, to compute theme- 



5. The mean may fall 
where no data ac- 
tually exist. 



5. Tbe mode falls 5. The median, like 

where the cases are the mean, may be 

most numerous. interpolated, and 

fall where no case 

actually exists. 



A comparison of the above averages indicates that the 
nature of the data and the problem to be solved must deter- 
mine the average to be used. If the size of the measures 
and the number of cases are to be taken into consideration, 
then the arithmetic mean is the average to use. If, how- 
ever, the most characteristic measure of tbe group is 
wanted, the mode best satisfies this condition. The arith- 
metic mean has the advantage of being a common measure 
and one with which the public is familiar. Its calculation 
is simple, but it is greatly effected by extreme eases and 
for that reason it should many times give way to the 
median or mode. One disadvantage of tlie mode is the fact 
that there is many times no well-defined type, and one 



MEASUREMENTS OF AVERAGES 303 

measure appears as often as another. In this case the 
median is probably the most representative term. 
, Quartiles and Percentiles. — It is sometimes convenient 
to divide the distribution into divisions smaller than those 
made by the median, that is, to divide it into quarters, 
tenths, etc. The medians dividing the halves of the dis- 
tribution into equal parts are known as quartiles. Start- 
ing with the lower end of the distribution, the median 
dividing the lower half is known as the first quartile, 
(Qi)f whereas the median dividing the upper half of the 
distribution is known as the third quartile, (Q^), The com- 
putation of the quartiles is the same as the median except 

N 
that in the first quartile we take the — case and in the 

third quartile the — case instead of the — case, as in the 
^4 2 

median. 

In the same way we may find any desired percentile 
in the distribution. If the distribution is divided into 
ten equal parts, the division points are known as deciles. 
A series of such measures gives a more complete picture of 
the distribution than can be obtained from a single 
measure. 




MEASUBEMEn^S OF DISPERSION, OR VARIABILITY 



In the previous chapter we noted the measures of central 
tendencies. We shall now note the dispersion or variation 
from these measures. Measures of variation call special 
attention to the degree of homogeneity which characterizes 
the distribution. The simplest measure of dispersion is 
the range, that is, the difference between the greatest and 
least magnitude in the series. While this is a simple measure 
of dispersion, it is a very imperfect one, as will be shown 
later, because two distributions might have the same range 
and yet differ widely in their " scatteration." A few illus- 
trations will make clear the point why measures of varia- 
bihty are necessary in describing a group of data. 

Suppose a teacher who contemplated going to another 
state to teach school was told that the mean salary of 
teachers in that state was Sl,250. It is evident that the 
mean does not convey sufficient information to make one 
intelligent as to what his chances are of getting the mean 
salary. It might be that nine-tenths of the teachers received 
salaries between |1,200 and $1,600, in which case one's 
chance of getting a salary pretty close to the average would 
be good- Or it might be that no teacher got a salary within 
$200 of the average. The measure of central tendency does 
not, therefore, give one anything like exact information as 
to what to expect. 

When one says that the average grade or the median grade 
of a freshman class in algebra is 75 per cent, it does not 
indicate the distribution of the grades in that class. They 

304 



MEASUREMENTS OF VARIABILITY 305 

may range from 5 per cent to 100 per cent, or from 65 per 
cent to 80 per cent, or cover any other range from per cent 
to 100 per cent. The deacription of the distribution is far 
more nearly complete when the dispersion or " scatteration " 
is given. 

At best the average is but a partial measure of type. If 
one desired to compare the salaries of teachers in two states 
or the grades in two schools and had nothing but the 
measures of central tendency, it would be impossible to 
draw any delinite conclusions, because the forms of the 
distributions might vary greatly. Two distributions might 
have the same range and the same mean, median, and mode 
and still differ widely as to form. In one distribution the 
cases might be concentrated near the measure of central 
tendency while in the other they might be distributed 
about equally over the entire range. Or, again, the meas- 
iires in one distribution might be concentrated at the ends 
of the range while in another distribution they might be 
concentrated at or near the center. 

How Variability la Measured. — We noted in the previ- 
ous chapter that a measure of central tendency was a 
position, or point on the scale. Variability differs from a 
measure of central tendency in that the latter is a paint or 
a position on the scale whereas the former is a distance. 
Variability is expres.sed as the distance on the scale that 
will include a certain proportion of the measures in the dis- 
tribution. This distance is expressed in various units, 
depending in part on the nature of the distribution and in 
part on the arbitrary choice of the statistician. 

Measures of Absolute Variability. — There are four meas- 
ures of absolute variability in common use. They are; 

1. The range, which includes all of the measures in the 
distribution. 

2. The mean deviation is the mean of all the deviations 
from a measure of central tendencj", such as the median or 



i 



306 EDUCATIONAL MEASUREMENT 

mean. When laid o£f on each side of the average in a nor- 
mal distribution, it includes the middle half of the cases. 

3. The standard devialum ia the square root of the mean 
of the squares of all deviations when the deviations are 
measured from a measure of central tendency, either the 
mean or the median. When thus laid off, it includes approxi- 
mately the middle two-thirds of the distribution, 

4. The quartile deviation, median 'deviation. The quar- 
tile or median deviation applies to that portion of the 
distribution contained between the first and third quartiles 
and is computed by taking one-half the range contained in 
the middle half of the distribution. It may be computed 
from the formula 



where Q3 represents the third quartile and Qi the first 
quartile. 

Another term frequently used is the pTobable error discussed 
in a previous chapter. If the distribution is sj-mmetrical 
then the probable error equals the median deviation and 
includes the middle half of the cases in the distribution. If 
the distribution is not symmetrical, but skewed, it is ques- 
tionable whether the tenn should be used. The term really 
belongs in sampling. Sampling is used under the following 
conditions: In statistics it is usually impossible, or at least 
not convenient, to obtain all the measures of any group of 
things under consideration, so we take samples and judge 
the entire group by the samples taken. When a farmer 
brings his wheat to 1-own, the man at the elevator does not 
examine critically the entire load but takes a sample from 
file front end of the load, one from the rear, and perhaps 
one from the middle, and judges the entire load from the 
samples taken. In a city school system a superintendent 
may desire to know what score the fourth-grade children 



MEASUREMENTS OF VARIABILITY 307 

are able to make on a certain test. It may be that he does 
not have time to test the fourth grade throughout the entire 
city. He therefore chooses a few schools at random and 
judges the entire fourth grade by the samples taken. But 
to be scientific he must know how reliable his samples are. 
That is, if he had taken samples from other schools, would 
they have differed radically from those taken? How dif- 
ferent would the results have been if he had tested the 
fourth grade through the entire city? PTobable error is a 
means of testing the reliability of samples. It is a quantity 
such that we would obtain values of greater and less mag- 
nitude with equal frequency if more cases or samples were 
taken. If the distribution is normal, it is evident that there 
will be as many measiu'es greater than those in the middle 
half as there are those that are less. Or, it is a " fifty-fifty " 
chance, when measures are under consideration not included 
in the middle half, that they will be larger than those in the 
middle half as often as they are smaller. For example, 
suppose a city superintendent desired to know the mean 
score of the eighth^ade children on the Monroe Silent 
Reading Test and that he did not have time to test all the 
eighth-^ade children throughout the entire city or that the 
expense were too great. He might then test ten classes, 
for instance, and determine the mean grade of the classes 
chosen. Let us suppose that the mean score was 25. He 
would next want to know how near this score of 25 would 
be to the mean score if he had tested all of the eighth-grade 
chUdren throughout the entire city. He knows that as he 
went from school to school testing the eighth grades, the 
mean score of some of them was more than 25 and for others 
it was less. In making his calculations let us suppose that 
he found the probable error to be 6. This means that in 
taking his ten samples as often as he found a measure 19 
or less (6 below 25), he would find one 31 or more (6 more 
than 25). It is evident that the greater the difference in 



3(M 



KDHCATIONAL MEASUREMENT 



nuidinK ivbility found in the samples taken, the greater the 
probabtn error. In other words a small P.E. means a 
rather homogeneous group. It is also evident that P.E. 
would not be the correct measure to use unless the distri- 
bution were normal. Since the quartile deviation b deter- 
mined by counting the number of measures between the 
first £iJid third quartiles, and taking one-half of them, it is 
really not deviation at all. 

Oomputation of the Mean Deviation. — We have indi- 
cated above that the quartile deviation is not a deviation 
from any particular average and takes account of the foim 
of the distribution only in an indirect manner. The mean 
deviation, on the other hand, is a real measure of deviation 
from a measure of central tendency. 

The illustration in Table XXIX wUl make clear the 
significance of the mean deviation. 



Table XXIX— Grades Made b 


r Ten 








AlXJBBSA 




Pupils 


Grade 


Deviation from Mean 




79 






6,6 




91 






5 4 




76 






9.8 




S6 






0.4 














90 






4.4 




95 






9.4 




M 

78 

79 

10)S56 






84 
7-6 
6.6 




SSC 






fl.08 = m(«nrf«.ia««, 



We note that tfie mean grade for this group of ten pupils 
is 85.6. No pupil makes the mean grade. The grade 
made fay a differs from the mean 6.6. That made by b 
differs by 5.4 and so ou. Th^M^of all the deviations 



MEASUREMENTS OF VARIABILITY 309 

from the mean, without reference to signs (that is, without 
reference to whether they are above the mean or below it), 
divided by the number of cases, which is 10, gives the average 
or mean deviation, 6.08. 

The mean deviation may be computed from either the 
mean or the median. It would be the same from either if 
the distribution were symmetrical. If only shghtly unsym- 
metrical, it would be the same to the second decimal point. 
If the distribution is considerably skewed, however, the 
deviation is less from the median because the arithmetic 
mean is affected by both the size of the items and the fre- 
quencies. In the case of the median only the frequencies 
and the size of the items neaj the center of the distribution 
affect this measure. The following illustration from Bowley 
will make clear the reason why deviations from the median 
are less than from the mean.' 

Suppose that it is required to run from a telephone exchange 
separate wires to every one of N places in a straight line, where 
should the exchange be placed so as to use the least total amount 
of wire? At the median position. For if you move from the 
median position to the right, or to the left, you will find immediately 
that you are adding more wire than you are subtracting. Sup- 
posing there are 20 stations, and you have a position between the 
10th and llth; if you move to a position between the Uth and 
12th you have to increase your distance from ten stations and 
diminish it from nine, in every case by the same length of the wire. 
The wires correspond to the deviations; and the sum of the 
lengths of the wires is the sum of the lengths of the deviations. 

It is evident that with an arithmetic mean there may be 
more cases on one side of the mean than on the other; there- 
fore, the sum of the distances from the mean to the various 
cases would be greater than from the median. From a 
mathematical standpoint it would seem, therefore, that the 
median is the proper measure of central tendency to use in 
computing the mean deviation. 

■ A, L. Bowley, Measuremeni of Groups and Series, p, 30. 



J 



310 



EDUCATIONAL MEASUREMENT 



Computation of the Mean Deviation: Data Grouped in 
a Frequency Distribution. — In Table 55IX we illustrated 
the computation of the mean deviation where the number 
of cases was small and the data were ungrouped. We shall 
now illustrate the computation with data grouped in a. 
frequency distribution. Table XXX illustrates the method. 

TabiJ! XXX. — DiaTRiBtmoN op Scores Given to 288 High-School 
PcpiL3 IS Plane GEOMETit?, iLLnsTRATrao the Couputation or 
THE Mean Deviation by the Lono Method 



Class 
Interval 


Mid-point 
Intervul 


Frequency 
/ 


Deviation 

d 


Frequency 
XDeviation 

fd 


95-100 
90- 94.99 
86-89.99 
80-84.99 
75- 79.99 
70- 74,99 
65- 69.99 
60- 64.99 
56-69.99 
50-64.99 
45- 49.99 
40-44.99 


97.5 
92.5 
87.5 

77:5 
72.5 
67.5 
62.5 
57.5 
52.5 
47.5 
42,5 


20 
62 
49 
27 
48 
23 
18 
21 
9 
6 
3 
2 


14.91 
9-91 
4 91 
0,06 
5.09 
10.09 
15,09 
20.09 
25.09 
30.09 
35,09 
40.09 


298.20 

614.42 

240.59 

2.43 

244,32 
232.07 
271 62 
421,89 
225,81 
180.54 
105.27 
80.18 




iV=288 


2,917.34 



'^=144 ''"^'" ^10.13 M.D. 

The true median ia 82.59 (computation not shown here). 

The computation of the mean deviation by the method 
given in Table XXX involves a great amount of work by 
reason of the fact that the exact deviations are taken from 
the true median. These, in most cases, are fractions that 
must be multiplied by the frequency. Much labor may be 
saved by assuming that the median is at the mid-point of 
the class interval in which the true median is located and by 
reckoning the deviation in terms of cla^ intervals instead 



MEASUREMENTS OF VARUBILITY 311 



of in terms of the units of class 
intervals. We may then make 
the proper correction for the 
assumptions made. That is, if 
the deviations are reckoned about 
an assumed median instead of the 
true median, the proper correc- 
tions must be made for the differ- 
ence between the assimied and 
the true medians. 

It will be foimd that the sum 
of the deviations about the 
assumed median is always less 
than those about the true median; 
hence the correction must always 
be added. 

This may be illustrated by 
Figure VIII. Suppose we have 
a distribution the range of which 
is from 40 to 100 and that the 
range is divided into class inter- 
vals of five units each. Let us 
suppose that the true median is 
82.59 as in Table XXX. The 
assiuned median is at 82.5. Since 
the true median is 82.59, it means 
that there are as many pupils 
who receive grades above 82.59 
as there are below it. But we 
make an assimiption that the 
median is located at 82.5 and 
that all the measures in the 
interval 80-85 lie at the mid- 
point of this class interval. 
Therefore the 27 cases would 



Class hterval 
1001 



gs 



—20 



—62 



90 



8& 



True 
medfianf ,d2.59- 

median. 



•49 



27 



SO- 



75- 



23 



ro- 



es 



60 



55- 



50- 



45- 



48 



-18 



•21 



• <»A«.«.,p 



40- 

FiGURB vni 



312 



EDUCATIONAL MEASUREMENT 



be coQsidered aa lying below the true median and would be 
counted with those below. It is also evident that there 
are now more cases below the true median than above it, 
when the 27 cases in the 80-85 interval are asaimied to be 
located at the mid-point, or at 82.5. This means that all 
the cases below the true median are too short by the differ- 
ence between the true and assumed medians, or that they 
are short 0.09 of a unit or O.09/5 of a class interval, or 0.018 
of a class interval. It also means that all the cases above 
the true median are too long by the difference between the 

TaBLb'XXXI. — DlSTRIBOTlON OF ScORES GlVEt) TO 288 HlOH-ScHOOL 

PupiLa IN Plane Geometry, Illustrating the Computation 

OF THE Mean Deviatiom by the Short Method 



Class Interval 


/ 


d 


fd 


95-100 


20 


3 


60 




90- 94.99 




2 


124 




85- 89-99 


49 


1 






80- 84,99 


27 










75- 79.99 


48 


1 






70- 74,99 




2 






65- 69,99 


18 


3 


54 




60- 64 99 


21 


4 






55- 59-99 


9 


5 




H 


50- 54,99 


6 


6 


36 




45- 49,99 


3 


7 


21 




40- 44.99 


2 


8 


16 






W = 288 


S/d=583 



ABHiuned median 82.5 

131 number of caaes above assumed median 




-°"^- — = 2.Q26 in units 
2,026X5 = 10. 13M.D. 



26X. 018 =0.468 
class intervals 



r MEASUREMENTS OF VARIABILITY 313 

true and assumed medians. The number above is 131, 
since the cases in the 80-85 interval are counted with those 
below. The number below is 157. Therefore 131 cases 
are too long and 157 cases too short, and the difference 
between them, or 26 cases, are too short by 0.018 of a class 
interval, hence the correction must be added. If the true 
median had been below the assumed median, the number 
of cases above would have exceeded those below, the num- 
ber of cases above would have been too short, and the cor- 
rection would have had to be added as in the case cited. 

Table XXXI illustrates the computation of the mean 
deviation by the short method. In this table we shall use 
the same data used in Table XXX. 

If it is desired to use a general algebraic formula for finding 
the mean deviation by the short method, the operation may 
be expressed thus; 

where M.D.=the mean deviation, /=the number of cases 
in each class interval, rf = the deviation in units of class 
intervals, c=the correction, which is the true median minus 
the assumed median, divided by the number of cases in the 
class interval; iV6= the number of measures below the true 
median, iVa = the number of measures above the true median, 
andJV^the number of measures in the entire distribution, 
5= the number of units in the class interval. Substituting 
in the general formula to iind the mean deviation in Table 
XXXI: 
Ni,=157 True median=82.59 Assumed median=82.5 

583+0.018(157-131) 583+0.018 

" -™-- "■''■ 288 2ir 

2.026X5=10.13 M.D. 



EDUCATIONAL MEASUREMENT 



SCHMART OF StBPS IN 7SE COMPUTATION OF TH£ MeAN 

Deviatfon bt the Sbort Method 

1. AiraDge the data in a frequency distribution of four coltmmB. 
Let the first column to the left contain the class inten-als; the 
second one the frequencies (/); the third the dertations (d); and 
the fourth tie product of the frequencies times the deviations (Jdj. 

2. Sum the frequencies in the / column. 

3. Compute the true median (TM^). 

4. Take the mid-point of t]ie class inten'al containing the trne 
median as the assumed median (AMa). 

5. Find the diSerence between the true median and the assumed 
median and divide this difference by the number of unite in the 
class Interval. This will give the difference in t«rms of the class 
interval. Call this the correction (c). 

6. Compute the number of cases above and below the true 
median in column /. Care musrt be taken to see whether the cases 
in the class interval containing the medians shall be added to the 
cases above the true median or below it. If the assumed median 
falls below the true median, then the case in the class interval 
containing the medians must be added to those below the true 
median for reasonjs given on page 311. If the assumed median 
falls above the true median, add the cases in the class intervals 
containing the medians to those above. Multiply the correction 
found in step 5 by the difference between the number of cases atwve 
and the number below the true median. 

7. Tabulate the deviations (d) from the mid-point of the claas 
interval containing the assumed median, giving the first class 
interval above a. deviation of I, the second one a deviation of 2, 
etc. Record the deviations below the interval containing the 
assumed median in the same way. 

8. Multiply each frequency (/) by its respective deviatjoa and 
record the results in the proper place in the fd column. 

9. Find the sum of the/d's without regard to sign. 

10. Add the total correction from step 6 above to the total num- 
ber of deviations from the median (E/d). 

11. Divide this sum by the total number of cases in the distri- 
bution (N) to get the mean deviation about the true median. The 
deviation thus computed is in terms of class intervals; therefore 
multiply this result by the number of units in the class interval in 
order to find the deviation in terms "f the original measures. 



MEASUREMENTS OF VARIABILITY 315 

The Computation of Standard Deviation. — ^We defined 
standard deviation (page 306) as the square root of the 
arithmetic mean of the squares of all the deviations meas- 
ured from the arithmetic mean of the distribution. 

U the series is simple and there is only one or a few 
measures of any given magnitude, the formula in this case 
is as follows: 



0" = 




In substituting in this formula we simply find the amoimt 
each measure deviates from the mean, square it, add the 
squares of all the deviations, divide by the total number of 
cases, and extract the square root. 

K, however, the number of cases is large, we arrange the 
data in a frequency distribution and apply the short method. 
The formula in this case is 



Efd 



in which (r=the standard deviation, S=the siun of the/cP's, 
/=the frequencies or the number of measures in each class 
interval, d=the deviations from the mean, and iV=the 
number of measures in the whole distribution. 

The Computation of Standard Deviation by the Short 
Method. — We shall note the short method of. computing 
the standard deviation and shall then compare standard 
deviation with mean deviation, and note wherein one seems 
to be superior to the other. Table XXXII illustrates 
the computation of the standard deviation by the short 
method, and the steps in the procedure are given in the 
table on the following page. 



316 EDUCATIONAL MEASUREMENT 

Table XXXll. — ^DisTtuBonoN or Scores Given to 288 High-Scbool 

Pupils in Plane Gbometbt, iLLnsTRATiNO the Compotation 

or Standakd Deviation bt the Shout Method 

[Data taken from TtMe XXXf) 



Class 
Interval 


/ 


d 


fd 


fd* 


95-lCM) 


20 


3 


60 


180 


90- 94.99 


62 


2 


124 




86-89,99 




1 


49 


49 


8(^84.99 


27 





233 




75-76,99 


48 


-1 


- 48 


48 


70- 74.99 


23 


-2 


- 46 


93 


65- 69.99 


18 


-3 


- 54 






21 


-4 




336 ^M 


55- 59.99 


9 


-5 


- 45 


240 ^H 


50- 51.99 


6 


-6 


- 36 


218 ^H 


4&- 49.99 


3 


-7 


- 21 


147 ^" 


40-44.99 


2 


-8 


- 16 


128 








-350 
233 

288) -117 

-.406 


288)1,846 

6.41=S» 



c- -0.406 6.41-0.165=6,245=0' 

c'= 0.165 D=2.50 class iDtervals 

2.50X5 = 12.50 actual units 

Sttmmart of Steps is the Compotation of Standard Devia- 
tion BY THE Short Method 

1. Tabulate the data in a frequency distribution as in the com- 
putation of the mean deviation, adding a fifth column to the ri^t 
to contain the/rf''B. 

2. Eatimate the bterval which contams the mean (80-84.99). 
This may be chosen anywhere in the distribution; but if chosen 
in the same interval, or near the true mean, the computation ie 



3. Tabulate the deviations from the estimated mean (the mid- 
point of the class interval) in units of class intervals; that is, the 
first interval above will have a deviation of I, the second one a 
deviation of 2, and so on; the first interval below will have a 
delation of —1, the second one will have a deviation of —2, and 



MEASUREMENTS OF VARIABILITY 317 

4. Multiply each frequency by its correeponding deviation and 
record in the fd colanm. 

5. -Find the algebraic sum of the fd'a; that ia, find the sum of 
the +fd'B and the Biun of the —fd'a, and take their difference: 
B/d=233-3fl0 = -117. 

6. Find the correction (c) by dividing fd by the total number 
of cases in the distribution; -117-^288 = -0.406. 

7. Multiply each fd by d, its correspooding deviation, and record 
in the column beaded fd\ (The student should use the table of 
Bquares, Table 1, Appendix, for squaring numbers, also for 
extracting roots.) 

8. Fmd the sum of the/d>'s: 2/d' = 1,846. 

9. Divide the sum of the /d''a by the number of cases in the 
whole distribution to get S": 1,846-?- 288-6.41. This is the 
square of the standard deviation, but it is computed from an 
estimated mean, and we must find it from the true mean. From 
the previous discussion it is clear that the mean of the deviations 
about the estimated mean must be in error by an amount equal 
to the arithmetic mean of the difference of the positive and nega- 
tive deviations in eolunm 4; that, is, the arithmetic mean of the 
squares of the deviations will be in error by an amount equal to 
the squares of this difference, or, c'. Square the correction c, 
giving c', or 0.165 and subtract c' from S', giving a'. It was not«d 
in step 3 that the deviations were in terms of class intervals; 
therefore a' will be in terms if class intervals, and its square root 
will, of course, be in the same terms. Multiplying rr by the number 
of units in the class interval (5) will give the desired standard 
deviation, 12.50. 

The methods of computing the mean and standard devia- 
tions having been presented, we are now in a position to 
compare them and discuss the superiority of one over tlie 
other. 

In the introductory chapter on statistical methods we 
devoted considerable space to the exposition of the arith- 
metical mean as the basis upon which rests the concept of 
error in a series of observations. We also discussed the 
method of least squares as a means of finding the most 
probable value of a series of observations. We are now 
prepared to show the significance of mean and standard 



318 



EDUCATIONAL MEASUREMENT 



deviations graphically. This can be done best, perhaps, 
by using the illustrations given by Roberts.^ Suppose 
A and B are two marksmen firing at a target the center of 
which is C, Figure IX. 

Suppose each man fires ten shots. Let the crosses represent 
the hits made by a and the circles the hits made by b. 




Table XXXIII shows the distance of each shot from the 
center of the target. 

Considering the amount each man misses the target as a 
deviation, we find that the sum of the deviations of each 
marksman is 50 and the mean deviation is 5. Therefore, 



MEASUREMENTS OF VARIABILITY 319 



Tabu! XXXIII 





Distance of Shots from Center of Taxget, Inches 


Shot Number 




A 


B 


1 


2 


4 


2 


8 


5 


a 


3 


6 


4 


7 


4 


5 


9 


3 


6 


4 


6 


7 


9 


5 


8 


5 


6 


9 


9 


7 


10 


1 


5 




50 


60 



if mean deviation were taken as a measure of their marks- 
manship, there would be a tie. If, however, we were to 
compute their marksmanship in terms of standard devia- 
tion, they would not tie, but b would be the winner. 

Table XXXIV shows the standard deviation in each 
series. 

Formerly the scores at rifle practice in the Belgian army 
were determined by adding the deviations of each man's 
shots from the center of the target, and the marksman 
having the smallest sum was the winner. 

It is evident from looking at the scores made by a and b 
in Table XXXIV that b is a more consistent marksman 
than A because his marksmanship is less variable; that is, 
he " bunches his shots.'' At one time a jfires a shot close 
to the target, and the next shot misses the target by many 
feet. B does not hit as close to the target as does a but is 
far more consistent in his shooting. 

The mathematical significance of standard deviation 
may be further demonstrated as follows: In the United 
States army, the standard size target used on rifle ranges 



EDUCATIONAL MEASUREMENT 
Table XXXIV 











Number 










of Shots 












d 


<P 


d 


<P 


1 


2 


4 


4 


16 


2 


8 


64 


5 


25 


3 


3 


9 


6 




4 


7 


49 


4 


16 


5 


9 


81 


3 






4 


16 






7 


2 


4 


5 


25 


S 


5 


25 


6 




9 


9 




7 


40 


10 


1 


I 


5 


25 




50 


10)334 
33,4 


50 


10)262 



V SS . 4 ■= Standard deviation of a =5. 77 
V26.2 =StandaTd deviation of b =5. 11 

at 200-300 yards distance is rectangular in shape, 4X6 feet 
in dimensions, in the center of which are three concentric 
rings, of which the central one (called the bull's-eye) has a 
diameter of 8 inches, the second, a diameter of 26 inches, 
and the lai^st one, a diameter of 46 inches. In firing at a 
tai^t of this kind the score is made up as follows: 

First ring, or bull's-eye 5 points 

Second ring 4 points 

Third ring 3 points 

Outeide target 2 points 

As indicated above, shots placed in a target may be taken 
in the sense of errors, or deviations from a mean. The 
mean in ibis case is the mathematical center of the target. 
The distance of each shot from the center is its deviation. 
This distance from the center of a target may be considered 
as a radius of a circle, and, since a shot may be placed at any 
point in a circle around the center of the target within the 



[ 



MEASUREMENTS OF VAEIABILITY 321 

limitB of its mdiuB, we may treat the various misses or 
deviations from the center as being proportional to the 
area of a circle tt^, of which r is the distance of the^shot 
from the center. We may then determine the sum of the 
areas of the various circles formed by each man's shots taken 
as radii, and, from our reasoning above, the man having the 
smaller total sum of circle areas will be the wiimer. Or, 
stating the relative merits of their marksmanship in a 
mathematical formula we may say a : b : ; 2(ir7^) : 2(jrri^). 
IT, being a constant, may be eliminated, and the formula 
may be stated thus a :b : : (2/^) : (Sri^). The rifleman 
having the smaller sum of squared deviations would win. 

If, however, we wish to carry the measurements a step 
further and obtain a measure of the relative average fluctu- 
ation or miss from the center of the target which will be an 
absolute measure of their relative marksmanship, we may 
divide the sum of the squared radii (j^) by the number of 
shots {N) in order to get the average squared error or miss. 
The square root of this will be the best possible measure of 
variability from the center of the target taken as a mean. 

Another point in favor of the use of standard deviation 
is the fact that it bears a definite relation to the nom[ial 
probability curve, or curve of error. It bears the same 
relation to the curve that the radiia of a circle bears 
to the circle. It is therefore a constant and limits the 
spread of the curve so that, when the standard deviation is 
small, the measures are concentrated near the center, and 
the curve rises rapidly, whereas, if the standard deviation 
is great, the curve is flat, and the measures are scattered 
widely from the center. As stated above, it is a constant 
and marks the point where the curvature of the curve 
changes from the convex to the concave as you go from the 
center. Its value includes 68.26 per cent of the cases. 

The Coefficient of Variability. — In the foregoing pages 
we have noted the principal methods of representing absolute 



J 



322 EDUCATIONAL MEASUREMENT 



NT J 

[ies of tl^H^ 



variability of frequencj' distributions. With t 
of the canes of variability in reference to mark; 
rrferenee wae made to the relative variabiliiie 
more distributions. The emphasis was rather to compre- 
hend more fully the distribution of the measures in refer- 
eaix to their measure of central tendeocy. To measure 
the amount of " scatteratiou," or spread from the measure 
of central tendency, we employed these measures: (1) 
mean deviation, (2) standard deviation, and (3) probable 
error. It would be a simple matter to compare the varia- 
bility of one distribution with another by the forgoing 
methods, since the unite of variabiUty were the same; 
that ia, in the case of grades they were recorded in per cent 
and in the case of marksmanship they were recorded in 
inches. It very frequently happens, however, that we desire 
to compare the variability of one distribution with another 
when the units are different, as, for instance, comparing 
the salaries of teachers with the length of time they have 
been in the service. In a distribution of this kind one of 
the units would be dollars and the other years. It is there- 
fore necessary to devise a measure of relalive variabiliiy to 
cover these eases. 

Pearson has devised such a measure, which he calls the 
coe^ient of variation. It is the measure of the ratio o£ 
absolute variability (standard deviation, mean deviation, 
quartile deviation, or probable error) to the average from 
which tliese deviations are taken (arithmetic mean, or 
median.) It may be expressed by the formula 

in which V ia the measure of variability, a is the standard 
deviation, and M the median, or mean. By this measure 
one is merely finding the per cent that the absolute varia- 
bility bears to the average from which the deviations are 



MEASUREMENTS OF VARIABILITY 323 

computed. It is clear that any other measure of varia- 
bility might be used instead of the standard deviation. 

A measure of this kind is independent of the units used 
and will show relative variability even though the units in 
the two distributions compared are entirely different. 
Thomdike proposes to take the square root of the measure 
of central tendency instead of using it as did Pearson. His 
formula would read : 

100 M.D. 



y= 



VMedian 



The Pearson coeflScient of variability seems to be a better 
measure, however, taking one distribution with another, 
than the one proposed by Thorndike. 




THE MEASUREMENT OF RELATIONSHIP, OR CORRELATION 



Need for Measures of Relationship. — One more groTip 
of measures is necessary to equip the student with adequate 
means for dealing with educational data. We desire not 
only to know the distribution of measures in a series of 
educational data, but many times we desire to compare 
one series with another and therefore need some measure 
that takes cognizance of the distribution of the various cases 
in the series and at the same time expresses the movements 
of the group as a whole. 

It is sometimes stated that there is a high correlation 
between abilities in mathematics and abilities in Latin. 
In order to compare two groups of this kind it is necessary 
to have measures that express the efficiency of each group 
as a whole. It is sometimes necessary to know how one 
group varies as compared with another. The comparison 
of one series of data with another is usually spoken of as 
correlation. 

The measurement of relationsfup or correlation is a tech- 
nical thing. Its relation to causation is also technical and 
difficult to understand. In order to help clarify the sub- 
ject and present it from a number of points of view, we 
shall refer to the statement of the problem as discussed by a 
number of the leading statisticians who have written on 
the subject. We shall also give a number of simple i]lu&- 
trations which will throw some light on the solution of the 
problem. 

Comparison involves the pairing of things or events that 



r 



MEASUREMENT OF CORRELATION 325 

are not identical in all particulars as to time, place, and 
condition. A study of cause and effect, whether coincidence 
or sequence, becomes largely a study of association. Causes 
never operate twice under exactly the same circumstances. 
Oneness of effect is only apparent. When making compari- 
sons in education or psychology, there is a tendency to 
attempt to safeguard oneself against error and criticism by 
introducing the proviso, "other things being equal." But 
" other things " are rarely, if ever, equal in actual life. 



Individual phenomena can only be claeBified and our problem 

turns on how far a group or class of like, but not absolutely same, 
things which we term " causes " will be accompanied or followed 
by another group or class of like, but not absolutely same, things 
which we term " effects." 

Bowley discusses correlation as follows:^ 

When two quantities are ao related that the fluctuations in one 
are in sympathy with fluctuations in the other, so that an increase 
or decrease of one ia found in connection with an increase or decrease 
(or inversely) of the other, and the greater the magnitude of the 
changes in the one, the greater the magnitude of the changes iu 
the other, the quantities are said to be coirelated. 

Davenport says:^ 

The whole subject of correlation refers to that interrelation 
between separate characters by which they tend, in some degree 
at least, to move together. This relation is expressed in the form 

tof a ratio. 
In reference to Bero correlation he says: 
If the characters in question are absolutely indifferent the one to 
the other, the correlation is said to be zero, indicating mere associ- J 

ation under the law of independent probability, without causative I 

relation of any kind. m 

L ' Karl Pearson, The Grammar of Science, p. 157. M 

■ > A. L. Bowley, Elemenis of Slalwliea, p. 316, ■ 

H ' Prinapks of Breeding, p. 4S3. H 



326 EDUCAXrONAL MEASUREMENT 

Pearson aays : * 

When we vary the cause, the phenomena cbanges, but not 
always to the same extent; it changes, but has variations in its 
change. The less the variation in tlmt change, the more nearly 
the cause defines the phenomena, the more closely we assert the 
association or correlation to be. It is this conception of correla- 
tion between two occurences embracing all relationships from 
absolute independence to complete dependence, which is the 
wider category by which we have to replace the old idea of 
causation. Everything in the universe occurs but once; there is no 
complete sameness of repetition, 

Secrist says : ^ 

A measure of correlation is a statement of probabilities, the 
reliability of which is determined by the degree to which the 
samples represent the whole " population " and the conditions 
under which the samples are taken, the range of condition. 

We are striving to devise some methods of determining 
the degree of causal connection exhibited by certain traits 
and a.ctivities in school work. The measuring of mental, 
social, and physical activities constantly involves the study 
of causation, or causal connection, between two or more 
traits in question. 

One of the psychological problenw about which there has 
been much dUcussion in the last decade is the question as 
to whether or not " school " abilities are speciaUzed or 
general. For example, what is the probability that a pupil 
who shows a high degree of achievement in Latin will show 
a high degree of achievement in mathematics? If we had 
peveral hundred cases and we found that some students 
who were good in Latin were also good in mathematics, 
that some who were good in Latin were mediocre in mathe- 
matics, and that a few who were good in the subject of 



1 
I 



MEASUREMENT OP CORRELATION 327 

latin were very poor in mathematics, and vice versa, it 
would be very difficult to draw a concIuBion aa to the 
correlation between the two traits unless we had some 
way of raeasurii^ the amounts of correspondence between 
them. 

Or, again, we might have a group of data showing that the 
student who ranked firet in Latin also ranked first in mathe- 
matics, and the student who ranked second in Latin ranked 
second in mathematics, and so on. Data of this kind 
would show that there is a causal relation existing between 
the two traits, but it would not show how much. This 
method of measuring the degree of correspondence between 
two traits takes account only of the position or rank of the 
various measures in the series and neglects the absolute amounts 
of the measures. For the measurement to be complete the 
actual proportional differences between each two consecu- 
tive marks must be measured. 

Two methods have been devised that take cognizance 
of the actual size of the vanous measures of the traits and 
show the degree of correspondence between them. The 
first is the graphic method, and the second is by the use of 
mathematical formulas. While the first method does take 
cognizance of the size of each measure in the distribution, 
it nevertheless must be refined by the application of mathe- 
loatics before the absolute value may be found. 

We shall first show how to represent the degree of corre- 
lation between two traits by the use of mathematical formu- 
lae, and then present methods for representing it graphically. 
It may seem to be more logical to present a graphic repre- 
sentation of the subject first, but it is believed that the 
subject may be made clearer by reversing what might seem 
to be the logical order. 

Perhaps enough has been said about correlation and causa- 
tion to introduce the reader to the subject. We shall now 
illustrate the computation of the coefficient of correlation 



328 



EDUCATIONAL MEASUREMENT 



hy a Dumber <^ sinq^ [hoUqds and aopptement the {veri- 
ous tFeatment by additional dBcnasion aa new probksm 
arise. 

I, lUmtT&tiag the Computation of the Coeffidait of 
Correlation : Data Simple and Dngronped- — Tie nnit with 
which we generally measore the degree of likeness or eoire- 
lation of one series with another is called the coefficient of 
correlation. The formula nsed is that derived by Karl 
Pearaon and is variously expressed as follows: 



= /^ o 






The Meaning of Syvtbols Used 

T —the coefficient of correlation. 

Z —the summation of the z-deviationa multiplied by the cof- 

reitponding ^-deviations. (£ always means summation 

when used in the formulas.) 
X —the deviation of each particular measure from the average 

in the first, or subject series. 
y -the deviation of each particular measure from the average 

in the second, or relative eeriea. In representing the 

deviations by i and y the proper algebraic signs must be 

retained. 
i' -the square of the deviation of the subject series. 
y* - the aquare of the deviation of the relative series. 
irf-(BocoDd formula) equals the standard deviation of the first 

aeries and is the same as the Vi" in the first formula. 
», — the standard deviation in the second scries. 
jV -the number of pairs of items in the series. 

The (weflioient of correlation r is a constant and a meaHure 
of (;roat importance in expressing correlation. It is evi- 
dently a pure number, and its magnitude is unaffected by 
tho units in which x and y are measured; for the numerator 
and denominator are affected to the same extent. 
The development of the Pearson formula gives values 



MEASUREMENT OF CORRELATION 329 

for r ranging from —1 through to +1. If r=+l, the 
correlation is said to be perfect and positive and means 
that large values in the first series are accompanied by large 
values in the second series, and vice versa. If r = — 1, the cor- 
relation is perfect but negative and means that large values 
in the first series are accompanied by small values in the 
second series, and vice versa. When r=0 the two series 
are independent. 

Table XXXV. — ^Illustrating the Computation op the Coeffi- 
cient OF COBRBLATION BETWEEN ADDITION AND HaNDWBITINQ 

{Hypothetical Case) 



Pupils 


Scores in Addition 
(Subject Series) 


Scores iu Handwriting 
(Relative Series) 


A 
B 
C 
D 
B 


3 

4 

5 

8 

10 




6 

8 

7 

12 

13 




5)30 




5)45 




6= 


mtan 


9»mean 



The deviation of each score from its respective mean is given 
below; also the product of each deviation in the first series 
by its corresponding deviation in the second series. 



Pupils 


Deviations, x, from the 
Mean Subject Series 


Deviations, y, from the 
Mean Relative Series 


^ 


A 

B 
C 
D 

E 


-3 
-2 
-1 

+2 
+4 


-4 
-1 

-2 

+3 
+4 


12 
2 
2 
6 

16 

38 



330 



EDUCATIONAL MEASUREMENT 



Squaring the individual deviations to find standard 
deviations we have: 



x« 


I/« 





16 


4 


1 


1 


4 


4 





16 


16 



34 



46 



We are now ready to substitute in the formula: 



_ Sa^ _ 38 



Vs^V V34X46 



=0.962 



In order to get a value for r equal to zero, it is evident 
that the numerator of the fraction in the above formula 
must be zero. A hypothetical case given in Table XXXVI 
will illustrate this point. 

Table XXXVI. — ^iLLXTSTRATma the Computation op the Coeffi- 
cient OF Cobrelation Equal to Zero 



First Series 


Second Series 


Deviations, 
First Series 


Deviations, 
Second Series 


1 


95 
85 
75 
65 
65 
45 

6)420 

70 =w€an 


92 
90 
88 
88 
90 
92 

6)540 

90== mean 


+25 

+15 

+5 

- 5 

-15 

-25 


+2 


-2 

-2 


+2 


+60 


-10 

+10 


-60 
xy^O 



Since xy=zeTo, therefore the correlation between the two 
series is zero. 



MEASUREMENT OF CORRELATION 331 



Tabus XXXVII. — Illustrating a Perfect rosmvB Correlation 


First Series 


Second Series 


Deviations, 
First Series 


Deviations, 
Second Series 


x^ 


20 


30 


-10 


-25 


250 


24 


40 


- 6 


-15 


90 


28 


50 


- 2 


- 5 


10 


32 


60 


+ 2 


+ 5 


10 


36 


70 


-f 6 


+15 


90 


40 


80 


+10 


+25 


250 


6)180 


6)330 






«y=700 


30 smean 


55=Yn^n 









100 

36 

4 

4 

36 

100 

280 



2/' 
625 

225 

25 

25 

225 

625 

1750 



Substituting formula, 






V^x^Xy^' 



700 



=Z52=i 



V280X 1,750 700 

Therefore the correlation is perfect. A perfect negative 
correlation may be illustrated in the same way. 

2. niiistrating the Computation of the Coefficient of Cor- 
relation : Data Complex and Grouped in Class Intervals. — 
In Tables XXXV, XXXVI, and XXXVII we iUustrated 
the computation of the coefficient of correlation by simple 
problems where the number of pairs of values were five and 
six. It is evident that, if we had hundreds of pairs of 
values and attempted to use the same procedure, the amount 
of labor in the computation would be very great. A glance 
at the Pearson formula shows that the coefficient of corre- 



332 EDUCATIONAL MEASUREMENT 

lation is found in terms of measures of central tendency 
and measures of variability; that is, we compute the mean 
of each series, find the deviation of each measure from ito 
mean, multiply it by the corresponding deviation from the 
mean in the other series, and find the sum of these products 
for the numerator of the fraetion. The denominator is 
the product of the standard deviations of the two series. 
There is therefore really nothing new in the development 
of th jg formula, since we showed in the two previous chap- 
ters how measures of central tendency and variabihty might 
be found. The chief concern here is to devise a distribution 
table that will reduce the labor to a minimum. Two plans 
will be presented. 

One will be a double entry distribution table which is 
simply a device for the arrangement of the pairs of values 
to facilitate computation, and the other will be a short 
method for computing correlations proposed by Dr. 
Leonard P. Ayres.^ 

We shall first note the computation of the coefficient of 
correlation by the traditional method, using a double entry 
distribution table, and at the same time take advantage of 
the short methods in computing the means and deviationfi 
developed in Chapters IX and X. 

Table XXXVTII illustrates the tabulation of the data 
and the computation of the coefficient of correlation with 
310 cases in the subjects of Latin and mathematics. In 
this table the mathematics is known as the first or sufcjed 
series, and the Latin, the reUUive or second series. The 
columns and rows are spoken of as arrays, the columns as 
y-arrays of type x, and the rows as x-arrays of type y. The 



' "A Shorter Method for Computing the Coefficient of Correlation," 
JowTud of Educational Research, Vol. 1, March, 1920, Also "The 
Apphcatioa to Tables of Distribution of a Shorter Method for Com- 
puting Coefficients of Correlation," Journal of EdTieatiorud Researeh, 
Vol. 1, April, 1920. 



[ MEASUREMENT OF CORRELATION 333 

size of the class intervals is determined here in the same 
way as it was in the previous discussions in Chapters IX 
and X. There are five units in each class interval. The 
Umits of the class intervals are written at the upper and 
left-hand margin of the table. The number of class inter- 
vals is determined by the range of the grades; from 40 to 
100 in Latin, and from 60 to 100 in mathematics. Noting 
the table (first row at the top) we see that 5 students made 
grades of from 95 to 100 in Latin and from 65 to 70 in 
mathematics; 20 made grades of from 95 to 100 in Latin 
and from 80 to 85 in mathematics, and so on. 

The heavy black horizontal lines marking the limits of 
the class interval 70-74.99 designate the row and class 
interval that contains the assumed mean of the Latin 
series. The heavy black vertical lines at the limits of the 
class interval 80-84.99 mark the column and class interval 
that contains the assumed mean for the mathematics series. 
The small figures in the upper right-hand corner of the 
squares are the deviations from the assumed mean {in unite 
of class intervals) of the subject series, multiphed by the 
corresponding deviations from the assumed mean of the 
relative series, and this product multiplied by the number 
.of measures in the class interval. Thus the —75 in the 
second column from the left and the first row from the top 
means that the 5 measures in that square have a deviation 
of —3 from the assumed mean of the mathematics series 
and a +5 from the assumed mean in the Latin 
Hence: —3X5X5= —75. The other figures are deter- 
mined in a amilar way. The /r-row at the bottom gives 
the sum of the measures in the various columns, and the 
/y-colunm at the right gives the sum of the measures in the 
various rows. The Sr'y' = cohjmn is the sum of the prod- 
ucts of the deviations of each measure from the assumed 
mean in each series; that is, — 75-|-70= —5; —72-1 — 96 
— 168, and so on. 



:S I 

J 



EDUCATIONAL MEASUREMENT 



AlilityinMitlumtia 







flO 


45 


70 


7. 


■O 


SB 


BO 


» 


i. 




IS 




-■; 






n 




■; 




H 

M 


w 




"1 


"S 












i> 




U 


M 










i 






■; 


« 






m 












ao 


" 




» 




n 

7i 


» 








'l',' 










„ 


i 




"■ 


















£0 


-» 




DO 























K 


m 




» 




J! 










!. 




a 














n" 






:> 




id 


M 


% 
















>- 




n 


>. 


















6 - 




le 


n 










30 








ao- 




-■ 


i 


" 


1. 


'"■ 


§0 


17 


1. 


• 


no 



^•==S|SI=!>s|ig|2|i| 

aSRSB *SS|S|22 III 



310) TO 

0,ZX-tf 
O.OBl-cJ 
V»- -O-IOS 



MEASUREMENT OF CORRELATION 



BUMMAKT OF StEPS IN THE COMPUTATION OP THE COEPPICIBNT 
OP COBEELATION 

1. Find the number of measures in each series and designate 
the number by N. (JV will, of course, be the same in each series.) 

2. Estimate the class interval that contains the mean and take 
its mid-point as the assumed mean (the same as was done in find- 
ing the mean by the short method in Chapter IX). For example, 
70-74,99 for the y'e and 80-84.99 for the I's. 

3. Tabulate the deviations from, the means chosen in terms of 
class inteiTvals. For example, the first row above the class interval 
70-74.99 has a deviation of -|-1, the second row +2, and so on. 
The first row below has a deviation of —1, the second a deviation 
of —2. The first column to the right of the mean column SO-84.99 
has a deviation of + 1, the second, -f 2; the first to the left a devia- 
tion of —1, and the second a deviation of —2, and so on. 

4. To the right of the table make a ti-column to record the 
y-deviations, also an /dy-column, an /dj-column and an Zx'y'- 
colunan, which latter is the sum of the products of x and y devia- 
tions calculated from an assumed mean. Similarly record the x 
deviations and other sunilar values at the bottom of the table. 

5. Multiply each frequency (/y-column for the y'a) by its 
respective deviation. For example, 32x5 = 160; 18x4=72; 
and so on, retaining the algebraic signs. Find the fd's for the x'b 
in a similar way. 

6. Find the algebraic sum of the fd^'s. For example, Xfdy = 
-357-^427=70; similarly Z/4- -245-(-100- -145. 

7. Divide the s/d's by the number of cases, N, to give the 

correction c. ForexampIe,Cv = rTTr=0.226; Ci= — qT^= —0.468. 

8. Square the corrections cj=0.051; c|=0.219. 

9. Multiply each fdg by d, its corresponding deviation, giving 
the figures in the /dj-eolunm. For example, in the {/-aeries, 
160X5=800; 72x4=288, and so on. In the z^eries -SX-4- 
32; and -lllx -3=333. 

10.. Find the sum of the /d''a in each series. For example, 
r/dj =3,154, and S/d^ = 683. 

11. Divide each of those sums by N, the number of cases, to 
give S', the square of the standard deviation of each distribution 

around the assumed mean. Sj =-5--- = 10.174; S| = r-- =2.203. 

dlO dlO 



336 EDUCATIONAL MEASUREMENT 

12. Subtract cj, the square of the correction, from Sj, and 
cj from the square of the correction Sf. For example, 10.174 
0.a5l = 10.123 =il; 2.203 -0.219 = 1.984 =<rj. 

13. Find the square roots of al and a^. For example, o,*- 
3.181; .., = 1.408. 

14. Compute the Zx'y"s by finding the sum of the deviationa 
of the measures in a particular row from the mean of the x'a of the 
whole table, x. This gives Sx\ Multiply Zx' by y', the deviation 
of this particular row from y, the mean of the y's of the whole table. 
This gives zx'j/', which is the product-sum of the deviatioDS about 
the two assumed means. 

This would give ub the numerator of the fraction in the 
Pearson formula were it not for the fact that both the a:' 
and j/ deviationa have been computed from assumed means 
instead of the true means and hence must be corrected. 
Since the means were in error by the corrections d and tv, 
so each deviation from them on y and x must be in error by 
the same amount. 

Going through the various rows and finding the alge- 
braic sum of the products of x'y' we get —5, —168, 45, 
etc., of the lx'y' = column. For example, adding —75 
and -f 70 gives —5; —72 and —96 gives — 16S, and so on 
The algebraic sum of the 2a:'y'-colunin=116, which, when 

divided by the total number of measures, JV = stt; = 0.374= 
^T'' c^=0226X-0.468 = -0.105. It may be shown 
algebraically that ^xy, the deviationa of x and y from their 
true means =—jn^ — CxCj. 
We may therefore write the'Pearson formula thus: ' 



' Let Ex and Ey represent the estimated means of the two series and 
Cx and Ci be corrections to be applied to the estimated means to set 
the true roeanB. Then the true means, Mi and Mg, are respectively, 
Mi^Ei+cx and Mg^Eu+c^. 

Iiet X and y be deviations from the true means, Mx and My. 

Let x' and y' be deviationa from the estimated means, Et and Ef. 



na I 
he ■ 



^MEASUREMENT OF CORRELATION 337 

We now have made all corrections so that we may sub- 
stitute directly in the formula, 

T j^ CxCp 

which equals the Pearson formula, 

Substituting, 

0.374-(-0.105) _ 0.479 ^ 
3.181X1.408 4.479 

3. Illustrating the Computation of the Coeificient of Cor- 
relation by the Short Method (Adapted from Ajrres) : (a) 
Series Simple and Ungrouped. — ^A shorter and more direct 

Thus 

x*^x+cx and y'=y+Cy. 
Therefore, 

Sa;V»S(a;-fc«)(y+Cy) 

Now since Zx and Zy (the sum of the x and y deviations from the 
true mean) each»0, then 

Sx V = Zxy + ZcxCyy or Zxy = Sx V — Zc^cy. 

or, substituting this expression in the equation, 

N<rx(r9 
we get, 

ZxY-NcaCy Zx'y' 



Czffy 



(Adapted from H. L. Bietz, Bulletin No. 148, University of Illinois 
Agricultural Experiment Station, 1910.) 



338 EDUCATIONAL MEASUREMENT 

method has been evolved by Ayres.^ We shall now illus- 
trate the computation of the coefficient of correlation by 
this method by two simple series with data migrouped. 
The method used for finding the coefficient of correlation 
in Table XXXVIII was a long one even though many 
" abort cuts " were used. 

In Table XXXV we gave two simple series to illustrate 

the computation of the coefficient of correlation. The 

process will be repeated in part to show the advantages 

of the " short cuts " used by Ayrea. The aeries were; 

Subject Series Relative Sedea 



The mean of the subject series is 6, that of the relative, 9. 
Calling the deviation from the mean in each series x and y 
respectively, and multipljTiig the deviation in one series 
by its corresponding deviation in the other and also finding 
the standard deviations of the two series, we have: 



which when substituted in the Pearson formula, 



V34X46 

The shorter method proposed by Ayres gives the sums of 
the products and the sums of the squares of the deviations 



r MEASUREMENT OF CORRELATION 339 

directly from the squares of tne original numbers. By this 
method it is not necessary to arrange the measures in order 
of their magnitude or to find the separate deviations from the 
means. Thus is avoided the necessity of taking into account 
the plus and minus signs of deviations. 

There are two fundamental principles in the computa^ 
tion of the coefficient of correlation by this method which 
the student should master. 

tl. This method considers every number in the series as 
being equal to the mean of the series plus a plus or minus 
demotion from that mean. Thus 3, which is the first number 
in the first series in the above iDustration, equals the mean 
of the series, or 6, plus a minus deviation, —3; 4 equals' 
the mean of the series, or 6, plus a minus deviation, —2; 

tand so on. 
2. // the sum of the squares of the nuTnbers in a aeries is 
found and from it is suhtraded the product found by multi- 
plying the square of the mean of the series by the number of 
cases, the remainder will be the sum of the squares of the dma- 
tions from the mean. 

For example, in the subject series given above the numbers, 
their means, and squares are : 

3 squared = 9 



4 


■ - 16 


6 


' - 25 


8 


' = 64 


10 


' -100 


5)30 


2U 


Mean= 


ISO 


uared= 36 


' a4 



By utilizing this method the coefficient of correlation may 
be computed directly from the products of the squares of 
the items of the two series without finding the separate 
deviations. 



S40 EDUCATIONAL MEASUREMENT 

The operations to be performed may be expressed as 
follows: 



^{.s^-^^')(.^-<^) 



1 



where S =the individual subject items as 3, 4, 5, 8, 10, in 
the subject series above 
R =the individual relative items 
Z ~the sum 
5P=the square of each mdividual subject item, as 9, 

16, 25, 64, 100 
fi* = the square of each individual relative item 
N = the number of cases 

Making the corrections £tnd substituting in the formula 
with data used above, we have 

^^^ 30X45 



"V(--f)(«'-T) 

308-270 

V(214-180){451-405) 



fV34X46 
In order to make the above substitutiora clear, let ua go 
through the various steps and note how each part is derived. 
The 308 is the sum of the products of the subject and relative 
items, that is: 3X5 + 4X8 + 5X7 + 8X12 + 10X13 = 308. 
Th 
mii 
by 



The fraction — = — is the sum of the subject items, 30, 

multiplied by the sum of the relative items, 45, and divided 
by the number of cases, 5. The first number under the 



MEASUREMENT OF CORRELATION 341 

radical sign, 214, is the sum of the squares of the items in 
the subject series. The fraction, soa^ does not seem at 
first to carry out the directions in the second principle 
stated above which says to subtract from the sum of the 
Bquarcs of the numbers the product found by multiplying 
the square of the mean of the series, 36, by the number of 
cases, or 5. The mean is 6, or -*jp, which wheu squared = 36, 
or ^-, and when multiplied by the number oE eases gives 
180, or A|^. The other numbers under the radical are 
found in the same way. 

(5) Data Complex and Grouped in Claa8 Intervals. — We 
shall now illustrate the computa.tion of the coefficient of cor- 
relation with the data grouped in class intervals. To show the 
similarity of this method to the one used_in Table XXXVIII 
we shall use the same data that were used in that table. 

The data should be arranged in a correlation table the 
same as in Table XXXVIII. At the left of the table (see 
Table XXXIX), and at the bottom, instead of insertmg the 
class intervals 40 to 44.99, 45 to 49.99, etc., as was done in 
Table XXXVIII, insert the numbers 1, 2, 3, etc., m the 
column marked S. In order to keep the data straight, the 
class intervals may be inserted if desired as in Table 
XXXVIII and the numbers in column iS inserted later. 
It should be noted that multiplying the frequencies by the 
numbers 1, 2, 3, etc., gives them the same relative values 
as if they were multiphed by the mid-points of the class 
intervals 42.6, 47.5, 52.5, etc. In like manner insert the 
figures 1, 2, 3, etc., in the margin at the top of the table. 
When data are grouped in class intervals, we may assume 
that they are grouped at the mid-point of the class interval. 
Thus, in the first row at the top, the class interval is 95 to 100 
(see Table XXXVIII), and all measures are assumed to be 
grouped at the mid-point, or at 97.5. The class interval in 
the second row is 90 to 94.99, and its mid-point is 92.5. 
Therefore all measures in the top row may be said to have a 



342 EDUCATIONAL MEASUREMENT 

value of 97.5 and in the next row, 92.5, and so on. Similarly 
the meaaures in the columns may be conaidered as grouped at 
their mid-points as in Table XXXVIII. The first colmmi 
to the left would then have a value 62,5, the second one 
67.5, and so on. To compute the coefficient of correla- 
tion by this method it is necessary to multiply the total 
number of measures in the rows and columns by their 
magnitude. Thus, to get the Sr-cohmin at the right of 
the table, we accordingly would have to multiply the total 
number of measures in the first row, aummed in the T- 
column, by the magnitude of the measures, which in thia 
case would be 97.5. In order to get the measures in the 
iSSr-column we would have to square the 97.5 and multiply 
it by the sum of the measures in the first row, or 32, But 
in order to avoid the squaring of large numbers and also 
the multiplication by large numbers, we insert the num- 
bers 1, 2, 3, etc., instead of the mid-values of the clasa 
intervals. 

The T, RT, and RRT rows at the bottom are found in the 
same way as the T, ST, and SST columns at the right. 
(Use Table I, Appendix, for aquarii^ the numbers.) 

The SS/-row at the bottom of the table is found by multi- 
plying the frequencies /by the subject series, S, finding their 
sum and recording in the proper squares below. Thus in 
the first column 3X2 = 6; in the second column, 12X5 = 60; 
11X6=66, 5XU=70; 4X12 = 48; 60+66+70+48 = 244. 
The other numbers in the row are found in a similar way. 

The R {SS/)-row is found by multiplying the numbers 
in the SS/-row by B, the relative series. Thus 1X6=6; 
2X244=488; 3X132 = 396, etc. 

In the computation below the diagram the numbers are 
taken from the totals at the right and the bottom of the 
columns and rows respectively. Substituting the values iu 
the Pearson formula we have, r = 0.107, the same as by the 
other method on page 334. 



MEASUREMENT OP CORRELATION 343 



(Adapted from Ayrei) 

Subject 







1 


2 


3 


4 


5 


6 


7 


8 


T 


ST- 


SST 




12 




S 






20 




7 




32 


334 


4,608 




11 




fi 


12 












13 


198 


2,178 




10 










36 






5 


40 


400 


4,000 




g 












20 


12 




32 


288 


2,692 




8 








11 










11 


88 


704 




7 








79 


1 








80 


560 


3,920 




6 



























1 


5 




14 




12 










26 


130 


650 


4 




12 








27 






39 


156 


624 


1 


3 


2 
















2 


6 


18 




2 





























1 










30 








30 


30 


30 




T 


2 


37 


12 


102 


86 


47 


19 


5 


310 


2,240 


19,324 




RT 


2 


74 


36 


408 


430 


282 


133 


40 


1,405 


1^="- 




RRT 


2 


148 


lOS 


1,632 


2,150 


1,692 


931 


320 


6,983 


^=«- 




25/ 


6 


244 


132 


701 


627 


288 


192 


50 








BizSf) 


6 


488 


396 


2,804 


3,135 


1,728 


1,344 


400 


10,301 





2.240X4.532 = 10,151.68 10,301-10,151.68- 

2,240X7. 226-16,186. M 19,324-16,186.24-3, 

1,406X4.532- 6,367.46 6,983- 6,367.46- i 

,-. ■°° -0.107 
V3,138X616 



344 EDUCATIONAL MEASUREMENT 

We amy further clarify the prooeES by comparing the 
steps here with the simpler problem solved by the Ayres 
short method on page 340- The fraction -''^;^ = 7.226, the 
mean of the relative series. V^=4.532, the mean of the 
subject series. It should be noted that in each case it is 
the sum of the values of the individual measures divided by 
the number of cases. 10,301 = 2(iSXfi) in the general 
formula and corresponds to the 308 in the simple problem. 

2,240X4.532=10,151.68=^^^^ of the general fonnula; 



10,151.68 = 150, the simplified numerator of the fraction 
corresponding to 38 in the simple problem. 6,983 = 2S^ in 
the denominator of the general formula and 19,324 = 2fl^j 

2,440X7.226 = 16,186.24 = =^ and 1,405X4.532=6,367.46 

= ?^ 

4. Representing the Degree of Correlation Between 
Two Traits by the Graphic Method: (a)Data Simple and 
Vngrouped. — In making a correlation table and inserting 
the values we must, of course, deal with each pau- of cor- 
related values separately in order to locate them in the 
right place in the correlation table. In plotting pairs of 
measures we construct two coordinate axes, one horizontal 
(OX), and the other vertical (,0Y), meeting at an ori^, or 
beginning point, at the bottom and left. On these axes 
we lay off scales representing the traits in question. The 
scales need not be made up of the same units. If we desire 
to find the correlation between the heights and weights of 
individuals, for instance, we would lay off one scale on, say, 
the OX-axis, to represent height and the other on the OY- 
axis to represent weight. The imits of the scale are laid 
off from the origin to the right on OX and upward on OY. 



J 



MEASUREMENT OF CORRELATION 345 



rWhen one desires to insert a pair of measiirea in a correla- 
tion table thus constructed, he finds the proper place on the 
scale on the X-axis for one trait and the proper place on the 
y-axis for the corresponding trait. He then erecta per- 
pendiculars at each of the points thus located and at the 
intersection of these perpendiculars a dot or small cross is 
made, which represents the pair of values in the correlation 



1 





ABIllTIBS ' IN MATHBMiVnCS 


■/ 




yi 




1 


/ 




s 






i 


/* 


¥■ 


" s. 


S 18 B S S S S 


a s 



table. In Table XL below are given the grades made by 
15 pupils in mathematics and Latin. Figure X shows the 
location of these 15 pairs of values in a correlation table. 
Pupil A made a grade of 95 per cent in mathematics and 93 
per cent in Latin. In inserting these values we find the 
place on the X-axis that is marked 95 and erect a perpen- 
dicular to the X-axis at that point. Similarly, we find the 



EDUCATIONAL MEASUREMENT 



point on the F-axis that corresponds to a grade of 93 per 
cent and erect a perpendicular to the K-axia at that point. 
At the intersection of these two perpendiculars we make a 
small cross which represents the pair of values in the corre- 
lation table. The other grades are similarly located in the 
table. 



1 



Pupa 


MatheiQfttics 


Latin 


^ 


95 


93 




93 


93 






94 




88 


89 




87 


85 


T 


70 


95 




80 


82 




86 


85 


1 


81 


79 




75 


70 


K 


70 






69 


71 


H 


65 


69 




60 


60 




55 


60 



Many Fairs of Values With Data Grouped in Class 
Intervals. — If perpendiculars were erected at the junction 
points of each of the class intervals on both the X- and 
K-axes in Figure X, we would have a convenient device for 
tabulating any number of pairs of values in the correlation 
table. When data are thus grouped in class intervals, 
we make the same assumption we did m the computation 
of the mean and median, namely, that all the measures 
are grouped at the mid-point of the class interval. Con- 
Bidering any one square in the distribution table we assume 
that the measiu"es are grouped at the mid-point of the 
class interval for Latin abilities and also for mathematics 



MEASUREMENT OF CORRELATION 347 

abilities, hence all the measures m a square are considered 
to be grouped at the mid-point of the square. When the 
data are thus recorded in the correlation table we count 
the number of measurea in each square and insert the proper 
figure to represent these measures. 

We are now ready to make an estimate of the degree of 
correlation existing between the two traits. This may be 
done by drawing a line that most closely approximates the 
general scattering of the pau^ of measures over the table. 
In drawing such a line we must take cognizance of the 
number of measures in each square. The line OD is the best 
fitting line for the 15 pairs of values in Table X and is known 
as the correlation line. 

It is evident that this method is inaccurate, because we 
could not take two similar correlation tables and tell which 
had the higher degree of correlation, for the position of the 
line OD is only estunated and not determined accurately. 
The measures might be rather similar in " scatter " but one 
would be unable to tell the exact degree of correlation in 
either table. 

In order to represent both graphically and accurately 
the degree of correlation existing between two traits, we 
must resort to a mathematical formula, such as the one 
devised by Pearson, that will take cognizance of the exact 
value of each measure in the correlation table. A line 
drawn that most closely approximates the general scattering 
of the pairs of measures in the table will bear a constant 

relation to the X-axis, which is expressed by the fraction -. 

X 

But the ratio - is the tangent of the angle made by the cor- 
relation line and the X-^aa (measured from the K-axis if 
the angle is more than 45 degrees.) We may take, there- 
fore, any value of - (which will be a value between and (1) 



348 EDUCATIONAL MEASUREMENT 

and with a table of natural tangents determine the number 
of degrees the line of correlation makes with the X-axis. 
The line may then be drawn accurately in the correlation 
tabic. 

Wc may further illustrate the graphic method of expres- 
sing correlation by referring to Figure XI. In that figure 
let the hne XOX' be the mean of abilities in mathematics 
and the line Y'OY the mean of the abilities in Latin. Then 




FiaiTRB) XI 

it is evident that if a student who was 5 units above 1 
mean in Latin was also 5 units above the mean in mathe- 
matics, and that if one 3 units above the mean in Latin was 
3 units above in mathematics, and the same ratio prevailed 
for all students both above and below the means of the two 
abihties, a curve plotted froto these data would be a straight 
line and would bisect the angle Y'OX'; that is, it would make 
an angle of 45 degrees with the X-axis, and the correlation 
would be perfect. If, however, the correlation were not 



MEASUREMENT OF CORRELATION 349 

perfect, then the line would make some other angle with 
the X-axis, depending on the degree of correlation between 
the two traits. 

If the correlation were zero, the line would take the 
position OF' or OX'. This would mean that any given 
change in a y-value would be accompanied by no change 
in an a;-value and vice versa. If the correlation were 
negative, then the hne would swing to the left of the line 
OY' taking any position aa OiV'. This would mean that 
as the y-values grow larger the x-valuea would grow smaller. 
If we have a perfect negative correlation, the line ON' 
would make an angle of 45 degrees with the X-axis. 

It thus may be seen then that the degree of correlation 
existing between two traits is expressed by the angle that 
the line of correlation makes with the X-axia. If the i- 
values increase more rapidly than the y-values, the line of 
correlation will lie in the angle DOX'. If the y-values 
increase more rapidly than the i-values, it will lie in the 
angle DOY'. It should be noted that, since the lines XOX' 
and YOY' represent the means of the two traits, any x- 
value greater than the mean will lie to the right of line, 
YOY' and will deviate from the mean positively, while a 
value below the mean will lie to the left of the line YOY' 
and will be a negative deviation. In like mamier any y-value 
above the mean will have a positive deviation from the mean 
and will he above the XOX' Uae, while a value less than 
the mean will he below XOX'. The signs for the four 
a may, therefore, be represented thus: 



k quadrants n 



Second Quadrant 

y = + 


!/ = + 


Third Quadrant 

y=- 


FovTih Quadrarlt 




350 EDUCATIONAL MEASUREMENT 

When the correlation was not perfect, the degree of corre- 
lation was expressed by Galton thus: Draw any horizontal 
line AC cutting the line of perfect correlation OD at B and 

the line ON at C. Then the ratio -jp measures the amoiint 

of correspondence in change in the two variables. A g?ven 
change in the size of y is accompanied by a proportional 
change in the size of X. When the hne ON swings to the 

position OD, then -j^ = 1 and the correlation is perfect ajid 

positive. When ON takes the position OF, the ratio -77;= 

infinity, since AC will equal zero; that is, a given change 
in the size of y is accompanied by no change in the size of x. 
When the line ON swings to the left of the line YOY', the 
degree of correlation is measured in a similar way from that 

quadrant but the correlation is negative. The ratio -jp ia 

called the coej^ent of correlfUion and is denoted by r. 

We shall now note another very important term in the 
comparison of traits. Galton found in measuring the 
heights of individuals that if a group of parents were found 
to be, say y-inches above or below the mean of the race in 
stature that the mean stature of their children would not 
deviate y-inches from the mean of the race, but would 
deviate only 2/3y-inche3 above or below the mean of the 
race. In expressing this fact Galton said that the off- 
spring tended to " regress " towards the mean of the race. 
Since then it has been common to speak of the liQ&Mi means 
of the correlation table as the hne of regressioil Since there 
is a line of means of the columns and also one for the rows, 
there will, of course, be two Unes of regression. The reader 
should note that the regression lines are not the same as 
the line that represents the coefficient of correlation. Each 
is expressed by a different equation, as will be shown later. 



MEASUREMENT OF CORRELATION 351 

Fincling the Equation of a Straight Line of Regression. 

— The most definite way to describe a line is to write its 
equation. Since the relationship of most of the educational 
data miay be expressed by a straight line, we shall note how 
to write the equation of a straight line. In order to do this 
we must be able to put two variables as x and y together in 
an algebraic equation in such a way that a given change 
in the value of one is accompanied by a proportional change 




n i I . ( D 



FiGTJRBXn 



in the value of the other. Let us write the equation of the 
line PQ in Figure XII. The equation of a line may be 
written if we know the value of any two points on it. From 
Figure XII we note that point Q has an a;-value of +5 and 
a 2/-value of +1. These values are measured from point 0, 
the origin of the graph. The distance QM is called the 
ordinate of point Q and the distance QN is called the abscissa 
of point Q. The points (5, 1) are called the coordinates 



362 EDUCATIONAL MEASUREMENT 

of Q, and in like manner ( — 5, +7) are called the codrdinates 
of P. The axes are spoken of as coordinate axes. The 
equation of the line PQ is 3x+5y=20, For any two points 
on the line PQ, the ratio of the difference of their ordioatea 
to the difference of their abscissas may be expressed as 

^ — —. It is evident that this ratio is constant for aH 
a:i-Z2 

points on the line. This ratio is the tangent of the angle 

that the line PQ makes with the line CD. (The tangent of 

an angle is the side opposite the angle divided by the adjacent 

side.) Since the ratio ~ — ~ measures the inclination of 

x^—X2 

the line PQ, it is called the slope of the line. 

LyCt point Q have an i-value of plus 5 and a j^value of 
plus 1 (+5, 1) and point P have an x-value of minus 5 and 
a y-value of plus 7 (—5, +7). Since point P has an x and 
y value and point Q also, the points may be designated 
Pi^i'Vi) and Q(x2,y2), xi and 12 beir^ the absciaaaa of points 
P and Q respectively, and j/i and ya, the ordinates. The 

slope of the line PQ equals ^_ . Let Pi (xy) be any other 

point on the line PQ. Then the slope of PiP will be ^^^, 

since Pi, P, and Q are on one line, slope PiP=sIope PQ. 

Hence we have the formula ■■=■-- " . Substituting in 

x—xi a;i— 3:2 
the equation the values for the codrdinates of points P 

■■-7 7-1 y-7 6 „, . , 
— ■ ~ Llearmg frac- 
tions we have Zx+dy — 20, which is the equation of the line 
desired. 

We now desire the equation of a line that will take 
cognizance of the " scatteration " of the measures from the 
means of the corresponding rows and columns. We noted 
that the degree of correlation is measured in terms of the 



f 



MEASUREMENT OF CORRELATION 353 

relative amounts of deviation of each point from the mean 
of the column and from the mean of the row in which it 
falls. In the measurement of dispersion from the means ^ 
of the columns and the rows we must use the same unit of 
deviation, if we desire the amounts of dispersion to be 
comparable. The value of standard deviation having been 
demonstrated in Chapter XI, under the heading of 
" Measurement of Variability," as the best measure of 
dispersion, it is therefore employed to measure dispersion 
in writing the equation of the line of regression. 

We know that the line that " best fits " the means of the 
arrays is that line from which the deviations of the means 
are the least possible. Professor Karl Pearson, who devel- 
oped the equation of the hne, employed the method of 
least squares (discussed in a previous chapter) in the loca- 
tion of this line; that is, he located the line in such a position 
that the sum of the squares of the deviations of the means 
of the arrays, each weighted by the number of measures in 
the respective arraj-s, would be the minimum. 

Pearson's Equation for a Line of Regression. — Pearson 
deduced the equation of the " best fitting " line as: 

yi-y = r'^(3:i-x). 

Meaning of Symbols Used 

y=the mean of the columns 

i=the mean of the rows 
I/i =a particular mGasure in the columns 
xi =a particular measure in the rows 

r =the coefficient of correlation designated by -: 

o„ = the standard deviation of the y series 
oi =the standard deviation of the i series 
The equation may be written in a condensed form a 



r 



k 



354 EDUCATIONAL MEASUREMENT 

in which 3/= the deviation of a particular j^-measure from 

its mean, or !/ = i/i— y- Similarly x=Xi—x. 

We rioted above that the alope of the line was repre- 
sented by the ratio - if the line passes through the origin of 
the graph, Let this value be represented by m; that ia, 
m=»- and y = -mx. We also note that 
y=Tfx 

Therefore, 

mx = r— X, and m = r — 

The slope of the line is therefore represented by r — . 

The expression r — ia known as the regression coe,fficient of 

y on x; that is, the deviation of y corresponding on the aver- 
age to a unit change in the type of x. If we used the othCT 

regression line, the slope would be expressed r—, which ia 

the regression coefficient of a: on j/ and means that deviation 
of X which corresponds to a unit change in the type of y. 
It should be noted that in a perfect correlation, that is, 
where the variability of the two traits is the same, there 
are no regression lines and m=r. 

Let us see what regression coefficients mean in the 
problem we have solved above, Table XXXVIII. 

Thereitwasfoundthatr = 0.107 ff,= 1.40S 0-^=3.181. 

r^" = 0. 107 ^^=0.107X2.359=0.241 regression coeffi- 
cient of y on X. 

r — = 0.107 „' - ^ -T = 0.047 regression coefficient oi x on y. 
fff o.isi 

y=0.2Hx, which means that for every~umt deviation 

from the type of x (ability in mathematics) it is most prob- 



MEASUREMENT OF CORRELATION 355 

able that there will be an accompanying deviation of 0.241 
as much in y (abihty in Latin). 

=0.047y, which means that for every unit of deviation 
from the type of y (ability in Latin) it is most probable that 
there will be an accompanying deviation of 0.047i {abihty 
in mathematics). 

The correlation coefficient is the geometric mean of the 
two regression coefficients. That is, the square of the cor- 
relation coefficient is equal to the product of the two regres- 
sion coefficients. Let us represent the regression coefficients 
by pi (rho) and p2 respectively. Then r^ = piXp2- This 
serves as a valuable check on the computation of these 
coefficients. In the problem cited above 7^ = 0.114 and 
PiXp2 = 0.113. The difference is due to dropping small 
fractions in the computation. 

The regression coefficients may be taken directly from 
the values computed in the Ayres Short Method also. In 

Table XXXIX it was shown that r^ — , ^^" =0.107. 

V3,138X616 

The fraction ^^g is the regression coefficient of a; on y and 
the fraction ^^ is the regression coefficient of y onx. The 
following computation shows that these fractions are equal 
to the regression coefficients. Let the regression coefficient 



by PI. Then since r = 



^xy n_ Si» M iV _^iy_ 150 

'N„^,,. /2J2 (2^2^ ;sp zf 3,138' 



"-Vf Vf s 



In like manner it naay be shown that P2 is equal to the fraction 

Coefficients of regressions may be made to yield valu- 
able information in many ways. They may be used in 
studying such questions as regularity of attendance and 



356 EDUCATIONAL MEASUREMENT 

promotion rates; progress through the grades and expendi- 
ture for supervision; time spent on spelling drills and 
scores made in spelling ability; years of schooling and 
earnings up to a certain a^e. In fact, coefficients of regres- 
sions may be made to yield very valuable information along 
a great many school hnes. In the correlation between 
Latin and mathematics (cited above) we may use regrra- 
sion coefficients to gain information of the following type. 

Suppose a student makes a grade of 85 per cent in mathe- 
matics, what is the most probable grade he will make in 
Latin? From the computation of the correlation and 
regression coefficients in Table XXXVIII we have the 
following data : 

0. 107 = r, the coefficient of correlation, 
1.408= the standard deviation of the mathe- 
matics or subject series, 
3.181= the standard deviation of the Latin or 

relative series; 
73.09 = the mean of the Latin series; 
80.16 = the mean of the mathenLitics series. 
We also have y = Q.2ilx which is the equation for the 
regression of y on X. Since the mean of the subject series is 
80.16, a student who gets a grade of 85 in mathematica 
would be 4,84 above the mean. In Figure XIV we have 
the means of the two series with the line ST passing through 
the point of their intersection. The line ST is the line 
whose equation is y = 0.24l3r. Since this line passes through 
the origin of the graph {the intersection of the two lines of 
means) we may assume any values for x and compute the 
corresponding values for y. If we assume x to be 1, then 
j/ = 0.241. This means that as the x-valuea increase one 
unit above the mean of the subject series, that is, the YOY' 
axis, the y-values will increase 0.241 as much above the 
mean of the relative series, that is, the XOX' axis. When, 
therefore, we asaume a value of 85 for x which is 4.84 above 



J 



r 



MEASUREMENT OF CORRELATION 357 

the mean of the subject series, the value of y will be 0.241 
as much above the mean of the relative serieH. That is 
0.241X4.84 = 1.166 which, when added to the mean of 
the Latin series, 7.309 = 74.26, which is the most probable 
grade that would be made in Latin, 

The general procedure for predicting ability in one trait 
when the correlation between two traits and ability in one 
are given is as follows: Let the trait in which the ability is 
given be the subject series. Compute the means of both 
series and write the equation for the regression line using 
the regression coefficient for y on x. Assume any value for 
X, as 85 in the problem above. Determine how much this 
is above or below the x-mean. 

Then, in the equation for the regression line, assume x to 
be 1 and compute the value for y which is 0.241 in the prob- 
lem above. Multiply the difference between the mean for 
the z-series and the assmned x-value (85—80.16) by the 
value of y and add the product to the y-mean. 

It should be noted also that the approximate values may 
be read directly from the graph of the regression lines. 
From Figure XIV we note that as we move 5 imits to the 
right of the subject mean the regression Une has risen 0.241 
as much above the relative mean. We may therefore 
assume any value for x and read the corresponding values 
for y directly from the graph. 

Assuming values for the mathematics aeries from 80 to 
100, increasing steps of 5, we may accordingly write the 
corresponding values for y as follows: 

The Reqekssion of y on z 
Assumed Corresponding 

x-valuea i/-valuea 



85 


74.26 






100 


7S 87 


H 



1 




358 EDUCATIONAL MEASUREMENT 



If it is desired to aBaiune values for the Latin in the above 
problem and compute the corresponding mathematics 
grades, we use the regression of a: on y and the equation for 
the other regression line, which is, x = 0.047y. 

The line PQ {Fig. XIV) is the line desired. It is meas- 
ured from the YOY' axis because we desire the regression of 
X on y. It should be noted that we can assume any value 
for Latin and read off the approximate mathematics grade 
directly from the graph, the same as with the other regression 
line. 

The value of the correlation coefficient, r, is always the 
value of the tangent of the angle that the correlation line 
makes wilh the .Y-axis where it passes through the point of 
intersection of the X and Y axes. It is never equal to the 
r^ression lines. If the correlation is perfect, there is no 
regression, hence no regression lines. In the above prob- 
lem r=0.107. Referring to a trigonometric table of natural 
tangents we find that 0.107 is the tangent of the angle 6° 7'. 

Therefore, if we desired to draw the correlation hne in 
the above correlation Table XXXVIII, it would make an 
angle of 6° 7' with the X-axis. 

The significance of the tangent in determining the slope 
of the correlation line and also the reasons why the value 
of 7-, the coefficient of correlation, may vary from —1 
through to -|-1 may be made clearer by geometry in the 
following illustration. 

About as a center, describe a circle with a radius equal 
tol. 

Let XOX' be a diameter of the circle and YOY' another 
diameter perpendicular to it. Let CX' be a geometrical 
tangent to the circle drawn perpendicular to the diameter 
XOX', and let OC be a line bisecting the ar^e YOX' and 
intersecting the tangent CX' at C. Then, by geometry, we 
know that the triangle COX' is a right angle triangle, that 
the angles COX' and X'CO are each 45°, and that the side 



^ 



MEASUREMENT OF CORRELATION 359 

CX'=the side 0X\ By construction 0X'=1. The line 
CX' therefore, equals 1, and since the line CX' is tangent 
to the circle at X\ the numerical value of such a tangent, 
limited by the point of intersection of a line forming an 
angle of 45® with the horizontal axis, is 1. 

From draw the line OC, making the angle COX' equal 
to 6® 7'. Now since CX'=0X'=1, and since the value of 




FiGUBBXm 



the coefficient of correlation, is 0.107, then the value of 

C'X' 10 7 
C'X' as a percentage of 1 is /nFr=77>?r or 0.107, Now it is 

evident that if the angle COX' were zero, the tangent would 
be zero and hence r would equal zero. It is also evident 
that r can never be greater than 1 because when the angle 
becomes greater than 45® we measure the tangent from the 
angle made by the correlation line and the line YOY. If 



360 EDUCATIONAL MEASUREMENT 

the correlation line should swing to the left of VOY' into 
the Becond quadrant, then r would be negative because the 
a^-valuea would grow less as the j/-values increased. The 
tangent would reach its maximum length when the correla- 
tion line reached 135° from point X' on the circle and would 
have a value of —I. Hence the values of r will vary from 
-fl through to -I. 

Figure XIV shows the correlation l in^^ and the two 
regression lines for the data in Table XXXVIII. In that 
table the mean for the mathematics series is 80.16 and is 
represented in Figure XHI by the hne YOV. The mean 
for the Latin series is 73.09 and is represented by the line 
XOX'. The line MN is the correlation hne and is drawn 
so as to make an angle of 6° 7' with the a^-axis. The line 
CD is the line of perfect correlation and makes an angle of 
45° with the JC-axis. The two regression lines are ST, the 
regression of y on x, which makes an angle of 13° 33' with 
the X-axis, and PQ, the regression of a; on y which makes an 
angle of 2° 42' with the F-axia. 

The Beliability of the Correlation Coefficient.— We 
found the coefficient of correlation between abilities in 
Latin and in mathematics to be 0.107. The number of 
cases taken to get this value was 310. This number repre- 
sents but a small part of the people in high school taking 
Latin and mathematics. The question arises: If we were to 
take 310 other cases and compute the correlation coefficient, 
would it be 0.107? Or, if we took all the students taking Latin 
and mathematics in high schools and computed the corre- 
lation between the two abilities, would it still be 0.107? 
We cannot answer these questions dogmatically and say 
the coefficient of correlation thus computed would, or would 
not, be 0.107, but we can apply the laws of "chance" 
and speak rather dogmatically from the facts gained. We 
know that the reliabUity of the coefficient of correlation, 
just as the reliability of the mean or of the standard deviar 



1 



tion, d 
question 
the coe£ 


MK 

penc 
. If 
Gciea 


\SUI 

a on 
the 

tof 


lEMENT OF CORREL.\TION 361 

the normality of the distributions m 
distributions are approximately normal, 
orrelation will be fairly reliable. On the 

ABILITY IN MATHEMATICS _ 

3 e C g S 8 g 


95 
90 
85 

gao 




















T 

3* 


















/ 
















/ 


/ 














/ 


/ 
















/ 


^_ 


^ 


73(B 








< 




■^ J 


- 


' 


■^B 




1 ■ 




■^ — - 


-"/ 










60 

55 

50 
C 




■ 




/ 


/ 
















/ 














/ 


/ 














/ 


/ 
















/ 






































FlQUBE J 

Other hs 

the relia 

When 

L 


Y' 

:iV, 8HOWTOO THE POSITION OF THE COKBELATJON LT 
REGBESSION LINES FOR DATA IN TABLE XXXVIU 

nd, if the distributions do not approach nom 
ijility of the coefficient of correlation become 
the distributions resemble normality, the 


™and 

lality, 
9 less. 

1 the 



362 EDUCATIONAL MEASUREMENT 

probable error (P.E.) may be used to estimate the probable 
slabiliiy of the coefficient. We find from the normal prob- 
ability curve that P.E. =0.6745 <r. 

The formula for the probable error of ike coefficient of wr- 
relaiion \s 

P.E., = 0.6745 i^ 

This formula shows that the reliability of r increases as JV 
increases; not directly, but in proportion to the square of 
the number of cases. This is true, of course, because when 
JV is large, the value of F.E.r becomes less. Therefore, to 
double the reUability of a coefficient we must take four 
times the number of cases; to triple the reliabihty we must 
take nine times the number of cases, and so on. For r to 
be considered fairly reliable it should be at least three times 
as large as the probable error, on the ground that it is very 
improbable that the true value of r falls outside j-it;3 P,E. 
Applying this formula to Table XXXVIII, the correlatioa 
between abilities in Latin and mathematics, we have 
p^^ 0.6745 (l-0.1O7^)^P^3^ 

Vsio 

or r = 0.107±0.037. Here the correlation coefficient ig a 
little less than three times the probable error, which makes 
it not entirely reliable as a measure of correlation. Whipple 
says:^ "In general, a correlation, like any other deter- 
mination, to have claim to scientific attention must be at 
least twice as large as its P.E., and to be perfectly satis- 
factory, should be perhaps four or five times as large," 

Spearman's Method of Eank CorrelatiDn, — Spearman's 
" foot-rule " for measuring correlation is a simple method 
of comparison by "rank" or "position" rather in terms 
of absolute quantity. This method L^ becoming popular 



1 



MEASUREMENT OF CORRELATION 363 

because of the ease of computation. It, of course, is less 
accurate than the product-moment method by Pearson. 
The formula for this method is 



iV2-l 



in which R expresses the degree of rank correlation (not to 
be confused with r in the Pearson formula) ; g is the numeri- 
cal gains in rank of an individual in the second, as compared 
with the first series; and N is the nmnber [of cases. Table 
XLI illustrates the computation of R. 

Table XU. — Correlation Between a Class in AnnmoN 
AND Handwritinq Measxtred bt the Thorndikb Scale 

Hand- 
Pupil Addition writing Gains, g 

A 13 16 

B 11 14 

c 10 15 1 

D 9 11 

E 8 13 1 

p 7 9 

G 6 12 2 



Substituting in the formula fl = l— ^r^^--^ =1— j^=6.5. 

In using the data in Table XLI to illustrate the computa- 
tion of the degree of correlation by the Spearman Rank 
Method we proceed as follows: 

Rank the measures in each series in the order of their 
magnitude. We may start with either the largest or small- 
est scores in ranking them. If we start with the largest 
scores in one series, we must do the same in the other, and 
vice versa. Compute the amoimt of gain in rank of each 
score value in the second series over the rank of its cor- 



364 EDUCATIONAL MEASUREilEXT 

responding score in the first series. For e:^mple: pupils 
A, B, D, and F did not gain in rank in handwriting over their 
respective ranks in addition. Pupil c ranks third in addi- 
tion but second in handwriting, hence his gain in rank is 1. 
Pupil E ranks fifth in addition and fourth in handwriting, 
hence gains I. Pupil g raziks eeventii in addition and fifth 
in handwriting, hence gains 2. The sum of the gains is 4 
which when substituted in the formula gives a value for R 
equal to . 5. 

In case of a tie in the rank in either series it is customary 
to divide the ranks in such a manner as to keep the total 
number of ranks in each series the same. If, for example, 
two scores ranked fifth each should be assigned a value of 
5.5 (that is one-half of 5+6). If three ranked fifth, they 
should all be assigned the rank of 6 (the mean of the fifth, 
sixth, and seventh places). 

The rank method should be used only when N is small, 
in which case its reliability is about aa great as the more 
accurate product-moment method. 



1. Athks, Leonard P., " A Shorter Method of Computing 
the Coefficient of Correlation," Journal of Ediicatumal Research, 
Vol. 1, March, 1920; also " The Application of Tables of Distri- 
butioQ of a Shorter Method for Computing Coefficients of Corre- 
lation," Journal o/ Educational Research, Vol. I, April, 1920. 

2. BowLET, Ahthob L., An Elementary Manual of StaHsUca 
(MacDonald and Evans, London, 1910), 

3. BowLEY, Arthur L,, The Nature and Purpose of the Measwe- 
menl of Social Phenomena (P. S. King & Son, Ltd., London, 
1915). 

4. Davbnpobt, EtJGENe, Principles of Breeding (Ginn & Co., 
1907). 

5. Eldeeton, W. Paun, and Elderton, Ethel M,, Primer 
of Statistics (Adam and Charles Black, London, 1914). 

6. JuDD, Charles H., /niroductton to the Sdet^fic Stvdy of 
EduaUion (Ginn & Co., 1918). 



' MEASUREMENT OF CORRELATION 365 

7. King, Willpokd I., The Elements of Statistical Method (The 
Macmillan Co., 1912). 

8. McCall, William A., " How to Compute a Median," 
Teachers College Record, Vol. 21, March, 1920. 

9. McCall, William A., Hmc to Measure in Education (The 
Macmillan Co., 1922). 

10. MoNHOE, Walter S., Measuring the ResuUs of Teaching 
(Houghton Mifflia Co., 1918). 

11. Pearson, Kakl, The Grammar of Science (Adam and 
Charles Black, London, 1911). 

12. RoBERTa, Herbert F., " A Practical Method of Demon- 
strating the Error of Mean Square," School Science and MtUhe- 
matics, Vol. 19, pp. 667-692. 

13. Roberts, Herbert F., "A Demonstration of the Coeffi- 
cient of Correlation for Elementary Students in Plant Breeding," 
School Science ond MtOhematics, Vol. 19, pp. 619-628. 

14. RuoG, Harold 0., SUdisHcal Methods Applied to Education 
(Houghton Mifflin Co., 1917). 

15. Sbceiht, Horace, An Introdvelion to Slatislical Methods. 

16. Starch, Daniel, EducaHonai Measurements (The Macmillan 
Co., 1917). 

17. National Society for the Study of Education, Twenty^ 
first Yearbook, Part II (Public School Publishing Co., Blooming- 
ton, ni., 1922). 

18. Thkiben, W., Report on the Use of Some Standard Tests for 
1916-1917 (Wisconsin State Department of Public Instruction, 
Madison, Wis., 1918). 

19. Thorndike, Edward L., An Irvtroducticm, to the Theory of 
Mental and Social MeaaureTnenls (Teachers College, Columbia 
University, New York, 1916). 

20. West, Carl J., Introdiiction to Mathematical Statistics (R. G. 
Adams & Co., Columbus, O., 1918). 

21. Whipple, Gut M., MamuU of Physical and Mental Tests, 
Part I (Warwick and York, 1920). 

22. Wilson, G. M., and Hose, Ereuer J., How to Measure 
(The MacmiUan Co., 1920). 

23. Various Conferences on Educational Measurement; Indiana 
University Bulletiua (Uoiversity qS ladiaua, Bloomington, Ind.) 



1 




J 



1 

I 

I :• 
' 1 



I I ' • 
I ■ I 

l!. ;■ 



'M 



I. ■ 



APPENDIX 

Squares and Square Roots 



No. 


Square 


Square 
Root 


No. 
36 


Square 


Square 
Root 


No. 
71 


Square 


Square 
Root 


1 


1 


1.000 


12 96 


6.000 


60 41 


8.426 


2 


4 


1.414 


37 


13 69 


6.083 


72 


61 84 


8.485 


3 


9 


1.732 


38 


14 44 


6.164 


73 


53 29 


8.544 


4 


16 


2.000 


39 


15 21 


6.245 


74 


54 76 


8.602 


6 


25 


2.236 


40 


16 00 


6.325 


76 


56 25 


8.660 


6 


36 


2.449 


41 


16 81 


6.403 


76 


57 76 


8.718 


7 


49 


2.646 


42 


17 64 


6.481 


77 


59 29 


8.775 


8 


64 


2.828 


43 


18 49 


6.557 


78 


60 84 


8.832 


9 


81 


3.000 


44 


19 36 


6.633 


79 


62 41 


8.888 


10 


1 00 


3.162 


46 


20 25 


6.708 


80 


64 00 


8.944 


11 


1 21 


3.317 


46 


21 16 


6.782 


81 


65 61 


9.000 


12 


1 44 


3.464 


47 


22 09 


6.856 


82 


67 24 


9.056 


13 


1 69 


3.606 


48 


23 04 


6.928 


83 


68 89 


9.110 


14 


1 96 


3.742 


49 


24 01 


7.000 


84 


70 66 


9.165 


16 


2 25 


3.873 


60 


25 00 


7.071 


86 


72 25 


9.220 


16 


266 


4.000 


61 


26 01 


7.141 


86 


73 96 


9.274 


17 


2 89 


4.123 


62 


27 04 


7.211 


87 


75 69 


9.327 


18 


324 


4.243 


63 


28 09 


7.280 


88 


77 44 


9.381 


19 


3 61 


4.359 


64 


29 16 


7.348 


89 


79 21 


9.434 


20 


400 


4.472 


66 


30 25 


7.416 


90 


81 00 


9.487 


21 


4 41 


4.583 


66 


31 36 


7.483 


91 


82 81 


9.539 


22 


484 


4.690 


67 


32 49 


7.550 


92 


84 64 


9.592 


23 


529 


4.796 


68 


33 64 


7.616 


93 


86 49 


9.644 


24 


5 76 


4.899 


69 


34 81 


7.681 


94 


88 36 


9.695 


26 


6 25 


5.000 


60 


36 00 


7.746 


96 


90 25 


9.747 


26 


6 76 


5.099 


61 


37 21 


7.810 


96 


92 16 


9.798 


27 


729 


5.196 


62 


38 44 


7.874 


97 


94 09 


9.849 


28 


784 


5.292 


63 


39 69 


7.937 


98 


96 04 


9.899 


29 


8 41 


5.385 


64 


40 96 


8.000 


99 


98 01 


9.950 


SO 


900 


5.477 


66 


42 25 


8.062 


100 


1 00 00 


10.000 


81 


9 6i 


6.668 


66 


43 66 


8.124 


101 


1 02 01 


10.050 


82 


10 24 


6.657 


67 


44 89 


8.185 


102 


1 04 04 


10.100 


83 


10 89 


6.745 


68 


46 24 


8.246 


108 


1 06 09 


10 . 149 


84 


11 66 


5.831 


69 


47 61 


8.307 


104 


1 08 16 


10 . 198 


86 


12 25 


6.916 


70 


49 00 


8.367 


106 


1 10 25 


10.247 



367 



F 




1 


^^^^^1 


36S 


APPENT5IX 


^^^1 




SoDun Am 


SgoAJuc RooiB-C(»d>-»Md { 


No 


Sqimn: 


"sr 


No, 

_ 


Squ&re 


& 


No 


Square 


te 


^ 


1 12 36 


10.296 


m 


1 98 81 


11.874 


178 


2 09 76 13.2e6 J 


m 


1 14 49 


10.344 


142 


2 01 6411 916 


177 


3 13 29;13.304 ■ 


106 


1 m &4 


10.392 


143 


2 04 49'11 958 


ITS 


3 16 84 13.343 | 


IM 


1 18 81 


10.440 


144 


2 07 36 12.000 


179 


3 20 41 13.379 ■ 


uo 


1 21 00 


10.488 


14G 


2 10 25il2 042 


180 


324 00 


13.416 


lU 


1 23 21 


10.536 


146 


2 13 16!l2.083 


181 


3 27 61 


13.454 


112 


1 25 44 10-583 


147 


2 16 09,12.124 


162 


2 3124 


13.491 


113 


I 27 89il0-630 


148 


2 19 04 12 16C 


183 


334 89 


13.528 


VA 


1 29 96110 677 


149 


2 22 0I|12.207 


184 


338 56 


13.565 


IIB 


1 32 2510.724 


160 


2 25 00|12 247 


186 


342 25 


13.601 


116 


1 34 56|10 770 


ISl 


2 28 01|12 288 


186 


3 45 96 


13 638 


117 


1 36 89,10 817 


163 


2 31 04,12.329 


187 


3 49 69 


13.675 


118 


1 39 2410 803 


168 


234 09 


12,369 


188 


3 53 44 


13.711 


119 


1 41 ai 


10.909 


1S4 


2 37 16 


12,410 


189 


3 57 21 


13.743 


lao 


144 00 


10.954 


166 


240 25 


12,450 


190 


3 61 00 


13.784 


Ul 


1 46 41 


11,000 


166 


243 38 


12,490 


191 


3 64 81 


13,820 


122 


J 48 84 


11,045 


1B7 


2 46 49 


12,530 


192 


3 68 64 


13.856 


123 


151 29 


11.091 


168 


2 49 64 


12.570 


198 


3 72 49 


13,892 


124 


1 53 76 


11.136 


169 


2 52 81 


12.610 


194 


3 76 38 


13 928 


136 


I 56 25 


11.180 


160 


2 56 00 


12 649 


196 


3 SO 25 


13.964 


186 


1 58 76J1I.235 


161 


2 59 21 


12,689 


196 


3 84 16 


14.000 


127 


I 61 29U,2G9 


162 


2 62 44 


12.728 


197 


3 88 09 


14.036 


128 


1 63 84 


11.314 


163 


2 65 69 


13.767 


198 


392 04 


14 071 


129 


1 66 41 


U.358 


164 


268 96 


12.806 


199 


3 96 01 


14,107 


180 


1 69 00 


11.402 


166 


2 72 25 


12.845 


SOO 


4 00 00 


14.142 


181 


1 71 fll 


11.446 


166 


2 75 56 


12 884 


201 


4 04 01 


14.177 


132 


I 74 24 


11,489 


167 


2 78 89 


12,923 


302 


4 08(M 


14,213 


1S3 


I 76 89 


11.533 


168 


2 82 24 


12.961 


203 


4 12 09 


14 248 


13i 


179 56 


11,676 


169 


2 85 61 


13.000 


204 


4 16 16 


14 283 


185 


182 25 


11.619 


170 


2 89 00 


13,038 


206 


4 20 25 


14.318 


186 


1 84 96 


11.662 


171 


2 92 41 


13.077 


206 


4 24 36 


14,353 


187 


1 87 69 


11.705 


172 


2 95 84 


13.115 


207 


4 28 49 


14.387 


138 


1 90 44 


11.747 


173 


2 99 29 


13.153 


308 


4 32 64 


14,422 


189 


1 93 21 


11,790 


174 


3 02 76 


13.191 


209 


4 36 81 


14,457 


140 


196 00 


11.832 


176 


306 25 


13 229 


210 


4 41 00 


14 491 


H 


1. . 




^ 



r 








1 


1 


SQUARES AND SQUARE ROOTS 369 




1 _ 


Squares and 


Squarb Vioiyia—C<mtvmuid 




No 


Square 


Square 
Root 


No 


Square 


Square 
Root 


No 


Square 


Soot 


211 


4 45 21 


14,526 


346 


6 05 16 


15.684 


281 


7 89 61 


16,763 


213 


4 49 44 


14.560 


247 


6 10 09 


15.716 


282 


7 95 24 


16.793 




213 


4 63 69 


14.595 


248 


6 15 04 


16.748 


283 


8 00 89 


16.823 




214 


4 67 96 


14.629 


249 


6 20 01 


15.780 


284 


8 06 56 


16,852 




2ie 


4 62 25 


14.663 


260 


6 25 00 


15.811 


285 


8 12 25 


16.882 




216 


466 56 


14.697 


261 


6 30 01 


15,843 


386 


8 17 96 


16.912 




217 


4 70 89 


14.731 


262 


6 35 04 


15.875 


287 


823 69 


16.941 




sie 


4 75 24 


14.765 


363 


6 40 09 


15.906 


388 


8 29 44 


16.971 




21S 


4 79 61 


14-799 


264 


6 45 16 


15.937 


360 


8 35 21 


17.000 




320 


484 00 


14.832 


366 


6 50 25 


15.969 


290 


8 41 00 


17.029 




331 


4 88 41 


14.866 


2G6 


6 65 36 


16-000 


291 


8 46 81 


17.069 




222 


4 92 84 


14.900 


3G7 


6 60 49 


16,031 


392 


8 62 64 


17.088 




S23 


4 97 29 


14.933 


2B8 


6 65 64116.062 


293 


8 58 49 


17.117 




224 


5 01 76 


14.967 


269 


6 70 8116.093 


294 


8 64 36 


17.146 




226 


506 25 


15,000 


280 


76 00 


16.125 


296 


8 70 25 


17,176 




226 


5 10 76 


15,033 


261 


6 81 21 


16.155 


28S 


8 76 16 


17,206 




227 


5 15 29 


15,067 


262 


6 86 44 


16.186 


397 


8 82 09 


17,234 




228 


5 19 84 


15,100 


263 


6 91 69 


16.217 


3SS 


888 04 


17,263 




229 


S 24 41 


15,133 


261 


6 96 96 


16.248 


390 


8 94 01 


17.2B2 




330 


529 00 


15, 166 


36G 


7 02 25 


16,279 


300 


9 00 00 


17,321 




231 


5 33 61 


15,199 


266 


707 56 


16.310 


301 


9 06 01 


17,349 




2S2 


5 38 24 


15,232 


367 


7 12S9 


16,340 


303 


9 12 04 


17,378 




333 


542 89 


15.264 


268 


7 18 24 


16.371 


303 


9 18 09 


17,407 




234 


S 47 56 


15.297 


2B9 


7 23 61 


16.401 


304 


9 24 16 


17.436 




23S 


5 62 25 


15.330 


370 


7 29 00 


16,432 


306 


9 30 25 


17,464 




236 


566 06 


15.362 


371 


7 34 41 


16.462 


306 


9 36 36 


17.493 




237 


5 61 69 


15.395 


373 


7 39 84 


16.492 


307 


9 42 49 


17,521 




S3B 


6 66 4.1 


15,427 


273 


7 45 29 


16,523 


308 


9 48 64 


17,560 




339 
340 


5 71 21 

5 76 00 


15,460 
15.492 


274 
376 


7 50 76 


16 553 


309 
310 


9 54 81 
9 61 00 


17.578 
17.607 




7 56 25 


16,583 


341 


5 80 81 


15,524 


276 


7 61 76 


16,613 


311 


9 67 21 


17.635 




343 


585 64 


15.550 


377 


7 67 29 


16,643 


313 


9 73 44 


17.664 




343 


5 90 49 


15.588 


378 


7 72 84 


10,673 


313 


9 79 69 


17,692 




344 


5 95 36 


15,620 


279 


7 78 41 


16 703 


314 


9 85 96 


17.720 




S4B 


6 00 25 


15,652 


280 


784 00 


16,733 


316 


9 92 25 


17-748 


i 


■ 


^ 







1 




^^^^^H 


370 APPENDIX 


^^^H 


SOOASMUCD 




No. 


B^lte 


ho. 


*.— 'fst 


No. 


Square 


te 


m 


fl 98 56117. 77ti 


361 


12 32 01 


18.736 


^ 


14 89 96 


19 647 


817 


10 W 8617.804 


363 


12 39 04 


18,762 


387 


14 97 69 


19.672 


318 


10 U 2417.833 


363 


12 46 09 


18.78S 


38S 


15 05 44 


19 698 


SU 


10 17 6l\ 


17.861 


364 


12 53 16 


18,815 


389 


15 13 21 


19 ras 


830 


10 24 00 


17.889 


366 


12 60 25 


18,841 


390 


15 21 00 


19,748 


8S1 


10 30 41 


17.916 


368 


12 67 36 


18,868 


391 


15 28 81 


19.774 


328 


10 36 84 


17-944 


367 


12 74 49 


18.894 


393 


15 30 64 


19.799 


3» 


10 43 29 


17,972 


368 


12 81 64 


18.921 


393 


15 44 49 


19 824 


324 


10 49 76 


18.000|!369 


12 88 81 


18 947 


3»4 


15 52 36 


19 849 


386 


10 as 25 


18.028 


360 


12 96 00 


18.974 


396 


15 60 25 


19,875 


329 


10 62 7fl 


18.055 


361 


13 03 21 


19.000 


386 


15 68 16 


19,000 


32T 


10 69 29 


18.083 


362 


13 10 44|19,026 


397 


15 76 09 


19,925 


388 


10 75 84 


18.111 


363 


13 17 69119,053 


398 


15 84 04 


19,950 


329 


10 82 41 


18.138 


364 


13 21 9619,079 


399 


15 92 01 


19,975 


830 


10 89 00118.166 


366 


13 32 25 


19.105 


400 


16 00 00 


20,000 


331 


10 9-5 fll!l8.193 


365 


13 39 50 


19,131 


401 


16 08 01 


20,025 


333 


U 02 24il8,221 


367 


13 46 89 


19.157 


403 


18 16 04 


20.060 


333 


11 08 89'18,248 


368 


13 54 21 


19,1S:1 


403 


16 21 09 


20,075 


334 


n U .56118.276 


369 


13 61 61 


19.209 


404 


16 32 16 


20.100 


336 


11 22 2518.303 


370 


13 69 00 


19,235 


406 


16 40 25 


20.125 


836 


11 28 gelis.MO 


371 


13 76 41 


19.261 


406 


16 48 36 


20.149 


837 


11 Sr, 69,18.358 


372 


13 m 84 


19.287 


407 


16 56 49 


20.174 


338 


11 42 44 


18.385 


373 


13 91 29 


19.313 


406 


16 64 64 


20,199 


S3B 


11 49 21 


18.412 


374 


13 98 76 


19,339 


409 


16 72 81 


20,224 


840 


11 56 00 


18.439 


376 


14 OC 25 


19.365 


410 


16 81 00 


20.248 


841 


11 62 81 


18.466 


376 


14 13 76 


19.391 


411 


16 89 21 


20.273 


343 


11 69 64 


18,493 


377 


14 21 29 


19.416 


413 


16 97 44 


20.298 


343 


11 76 49 


18,520 


378 


14 28 84 


19,442 


413 


17 05 69 


20.322 


344 


U 83 36 


18.. M7 


379 


14 36 41 


19.468 


414 


17 13 96 


20.347 


346 


11 90 25 


18.574 


380 


14 44 00 


19.491 


116 


17 22 25 


30,372 


846 


11 97 16 


18-601 


381 


14 51 61 


19.519 


116 


17 30 56 


20.390 


847 


12 04 09 


18,628 




14 59 24 


19.545 


117 


17 38 89 


20.421 


348 


12 11 04 


18.655 




14 66 89 


19.570 


418 


17 47 24 


20.445 


849 


12 18 01 


18-682 




14 74 56 


19.696 


419 


17 55 61 


20-460 


350 


12 25 00 


18.708 


3BS 


14 82 25 


19.621 


430 


17 64 00 


20.494 




•, 


J 



W SQUARES AND SQUAllE ROOTS 371 1 


■ Squabes AMD Sqoasb Eoore— CimMnu«d 


1 No 


Square 


Sguare 
Root 


No 


Square 


Root 


No, 


Square 


te 


f ^ 


17 72 41 


20.518 


»6 


20 79 36 


21.354 


491 


24 10 81 


22.169 


422 


17 80 84 


20.543 


467 


20 88 49 


21.378 


492 


^4 20 64 


22.181 


423 


17 89 29 


20.567 


468 


21 06 81 


21.401 


493 


24 30 49 


23.204 


434 


17 97 78 


20.591 


469 


21 16 00 


31.424 


494 


24 40 36 


22.226 


42G 


18 06 25 


20.616 


100 


21 16 00 


31,448 


406 


24 50 25 


22.249 
22.271 


42G 


18 14 76 


20,640 


461 


21 26 21 


31.471 


496 


24 eo 16 


427 


18 23 29 


20.664 


462 


21 34 44 


21.494 


497 


24 70 09 


23.293 


428 


18 31 84 


20,688 


163 


21 43 69 


21.517 


198 


24 80 04 


22.316 1 


_ ^9 


18 40 41 


20,712 


(64 


21 52 96 


21.641 


199 


24 90 01 


22.338 


L 430 


18 49 00 


20,736 


466 


21 63 25 


31,564 


GOO 


25 00 00 


22.361 


1 431 


18 57 61 


20,761 


166 


21 71 56 


21.587 


601 


25 10 01 


22.383 


■ 432 


IS 66 24 


30,785 


467 


21 SO 89 


21.610 


602 


26 20 04 


22.406 


■ 433 


18 74 89 


20,809 


468 


21 90 34 


21,633 


603 


25 30 09 


22.428 


■ 434 


18 83 56 


20.833 


469 


21 99 61 


21.656 


604 


25 40 16 


33,460 


I 435 


18 92 25 


20,857 


470 


22 09 00 


21,679 


606 


25 60 25 


22.472 


W 430 


19 00 96 


20.881 


471 


22 18 41 


31.703 


606 


25 60 36 


22.494 


437 


19 09 69 


20.905 


472 


22 27 84 


21,736 


607 


25 70 49 


22.617 


438 


19 18 44 


20.928 


473 


22 37 29 


21.749 


608 


25 80 64 


22.639 


439 


19 27 21 


20.952 


474 


22 46 76 


21.772 


609 


26 90 81 


22.561 


440 


19 36 00 


20,976 


476 


22 56 25 


31.794 


610 


26 01 00 


23.683 


441 


19 44 81 


21.000 


476 


22 65 76 


21.817 


611 


26 11 21 


22.606 


442 


19 53 64 


21.024 


477 


22 75 29 


21.840 


618 


26 21 44 


22.627 


443 


19 02 49 


21.048 


478 


22 84 84 




613 


26 31 69 


33.650 


444 


19 71 36 


21,071 


479 


22 94 41 


31 ! 886 


614 


26 41 96 


32.672 


446 


19 80 25 


21,095 


480 


23 04 00 


21,909 


616 


26 62 25 


22.694 


446 


19 89 16 


21.119 


181 


23 13 61 


21.932 


GIG 


26 62 66 


22.716 


447 


19 93 09 


2L142 


182 


22 23 24 


21.954 


617 


26 72 89 


22.738 


448 


20 07 04 


21.166 


183 


23 32 89 


21.977 


618 


26 83 24 


22.780 


449 


20 16 01 


21.190 


184 


23 42 56 


22.000 


619 


26 93 61 


32.782 


4B0 


20 25 00 


21,213 


166 


23 52 25 


22.023 


620 


27 04 00 


22.804 


461 


20 34 01 


21,237 


486 


23 61 96 


22. M5 


621 


27 14 41 


22.826 


4E2 


20 43 04 


21.260 


487 


23 71 69 


22.068 


632 


27 34 84 


22.847 


4G3 


20 52 09 


21.284 


488 


23 81 44 


22.091 


523 


27 35 29 


22,860 


4H 


20 61 16 


21.307 


489 


2A 91 21 


22.113 


624 


27 45 76 


22.801 


4SS 

■ 


20 70 25 


21.331 


190 


24 01 00 


22.136 


626 


27 66 26 


23.913 J 


L J 



r 

■ 372 APPENDIX 


^H 


H SqDAsES &Ni> Squabs Roon — Conttiwed ■ 


1 3 


Square 


Square 
Root 


No. 


Square 


Square 
Root 


No. 


Square 


te- 


i m 


27 66 76 


22,935 


561 


31 47 21 


23,685 


696 


35 52 16 


34.413 


SS7 


27 77 29 


22.956 


662 


31 58 44 


23.707 


097 


35 64 09 


24.434 


S3S 


27 87 81 


22.978 


663 


31 69 69 


23.728 


698 


35 76 04 


24-454 


SM 


27 98 41 


23,000 


G64 


31 80 96 


23.749 


699 


35 88 01 


24.474 


B30 


2S09 DO 


23.023 


666 


31 92 25 


23.770 


600 


36 00 00 


24.495 


ni 


28 19 61 


23.043 


see 


32 03 56 


23.791 


601 


36 12 01 


34.516 


BS8 


2S 30 24 


23-065 


B67 


32 14 89 


23-812 


602 


36 24 04 


34.636 


6S3 


28 40 as 


23.087 


668 


32 26 24 


23.833 


603 


36 36 09 


24.556 


6M 


28 51 56 


23.108 


669 


32 37 61 


23.854 


601 


36 48 16 


24.576 


ESS 


28 62 25 


33-130 


G70 


32 49 00 


23.875 


606 


36 60 25 


24.597 


536 


28 72 96 


23.152 


B71 


32 60 41 


23.896 


606 


36 72 36 


24.617 


SS7 


28 83 69 


23.173 


672 


32 71 84 


23.917 


607 


36 84 49 


24.637 


B3S 


28 94 44 


23,195 


673 




23-937 


608 


36 96 64 


24.658 


68ft 


29 05 21 


23.216 


B74 


32 94 76 


23.958 


609 


37 08 81 


24.678 


640 


29 16 00 


23,238 


676 


33 06 25 


23.979 


610 


37 21 00 


24.698 


B41 


29 23 81 


23.259 


G76 


33 17 76 


24.000 


611 


37 33 21 


24.718 


S42 


29 37 84 


23.281 


B77 


33 29 29 


24.021 


612 


37 45 44 


24.739 


S43 


29 48 49 


23,302 


578 


33 40 84 


24.042 


613 


37 57 69 


24.759 


a* 


29 59 36 


23.324 


679 


33 52 41 


24-062 


614 


37 69 96 


24-779 


M6 


29 70 25 


23.345 


680 


33 64 00 


24.083 


616 


37 82 25 


24.799 


646 


29 81 16 


23,367 


681 


33 75 61 


24.104 


616 


37 94 56 


24.819 


647 


29 93 09 


23.388 


682 


33 87 24 


24.125 


617 


3S 06 89 


24.839 


648 


30 03 04 


23,409 


533 


33 98 89 


24.145 


618 


38 19 24 


24.860 


649 


30 14 01 


23.431 


684 


34 10 56 


24-166 


619 


38 31 61 


24.880 


660 


30 25 00 


25.452 


686 


34 22 25 


24.187 


620 


38 44 00 


24.900 


6B1 


30 36 01 


23.473 


586 


34 33 96 


24.207 


621 


38 56 41 


24.920 


6S2 


30 47 01 


23.495 


687 


34 45 69 


24.228 


622 


38 68 84 


24.940 


663 


30 58 09 


23.516 


688 


34 57 44 


24.249 


623 


38 81 39 


24.960 


BH 


30 69 16 


2;i.537 


589 


34 69 21 




624 


38 93 76 


24-980 


fiSS 


30 80 25 


23.558 


6S0 


34 81 00 


24.290 


620 


39 06 25 


25.000 


666 


30 91 36 


23.580 


eei 


34 92 81 


24.310 


626 


39 18 76 


25.020 


667 


31 02 49 


23,601 


592 


35 04 64 


24.331 


627 


39 31 29 


25.040 1 


6sa 


31 13 64 






35 16 49 


24.353 


628 


39 43 84 


25.060 


669 


31 24 81 


23.643 


5S4 


35 28 36 


24.372 


629 


39 56 41 


25.080 J 


660 


31 36 00 


23.664 


695 


35 40 25 


34.393 


630 


39 69 00 


25.100 J 


, . . J 



SQUARES AND SQUARE ROOTS 



373 



Squabes AMD Squabk Roots — ConHnuei 



No. 


Square 


Square 
Root 


No. 


Square 


Square 
Root 


No. 
701 


Square 


Square 
Root 


631 


39 81 61 


25.120 


666 


44 35 56 


25.807 


49 14 01 


26.476 


632 


39 94 24 


25.140 


667 


44 48 89 


25.826 


702 


49 28 04 


26.495 


633 


40 06 89 


25.159 


668 


44 62 24 


25.846 


703 


49 42 09 


26.514 


634 


40 19 56 


25.179 


669 


44 75 61 


25.865 


704 


49 56 16 


26.533 


635 


40 32 25 


25.199 


670 


44 89 00 


25.884 


706 


49 70 25 


26.552 


636 


40 44 96 


25.219 


671 


45 02 41 


25.904 


706 


49 84 36 


26.571 


637 


40 57 69 


25.239 


672 


45 15 84 


25.923 


707 


49 98 49 


26.589 


638 


40 70 44 


25.259 


673 


45 29 29 


25.942 


708 


50 12 64 


26.608 


639 


40 83 21 


25.278 


674 


45 42 76 


25.962 


709 


50 26 81 


26.627 


640 


40 96 00 


25.298 


675 


45 56 25 


25.981 


710 


50 41 00 


26.646 


641 


41 08 81 


25.318 


676 


45 69 76 


26.000 


711 


50 55 21 


26.665 


642 


41 21 64 


25.338 


677 


45 83 29 


26.019 


712 


50 69 44 


26.683 


643 


41 34 49 


25.357 


678 


45 96 84 


26.038 


713 


50 83 69 


26.702 


644 


41 47 36 


25.377 


679 


46 10 41 


26.058 


714 


50 97 96 


26.721 


646 


41 60 25 


25.397 


680 


46 24 00 


26.077 


716 


51 12 25 


26.739 


646 


41 73 16 


25.417 


681 


46 37 61 


26.096 


716 


51 26 56 


26.758 


647 


41 86 09 


25.436 


682 


46 51 24 


26.115 


717 


51 40 89 


26.777 


648 


41 99 04 


25.456 


683 


46 64 89 


26.134 


718 


51 55 24 


26.796 


649 


42 12 01 


25.475 


684 


46 78 56 


26.153 


719 


51 69 61 


26.814 


650 


42 26 00 


25.495 


685 


46 92 25 


26.173 


720 


51 84 00 


26.833 


651 


42 38 01 


25.515 


686 


47 05 96 


26.192 


721 


51 98 41 


26.851 


652 


42 51 04 


25.534 


687 


47 19 69 


26.211 


722 


52 12 84 


26.870 


653 


42 64 09 


25.554 


688 


47 33 44 


26.230 


723 


52 27 29 


26.889 


654 


42 77 16 


25.593 


689 


47 47 21 


26.249 


724 


52 41 76 


26.907 


656 


42 90 25 


25.573 


690 


47 61 00 


26.268 


725 


52 56 25 


26.926 


656 


43 03 36 


25.612 


691 


47 74 81 


26.287 


726 


52 70 76 


26.944 


657 


43 16 49 


25.632 


692 


47 88 64 


26.306 


727 


52 85 29 


26.963 


658 


43 29 64 


25.652 


693 


48 02 49 


26.325 


728 


52 99 84 


26.981 


659 


43 42 81 


25.671 


694 


48 16 36 


26.344 


729 


53 14 41 


27.000 


660 


43 56 00 


25.690 


696 


48 30 25 


26.363 


730 


53 29 00 


27.019 


661 


43 69 21 


25.710 


696 


48 44 16 


26.382 


731 


53 43 61 


27.037 


662 


43 82 44 


25.729 


697 


48 58 09 


26.401 


732 


53 58 24 


27.055 


663 


43 95 69 


25.749 


698 


48 72 04 


26.420 


733 


53 72 89 


27.074 


664 


44 08 96 


25.768 


699 


48 86 01 


26.439 


734 


53 87 56 


27.092 


665 


44 22 25 


25.788 


700 


49 00 00 


26.458 


736 


54 02 25 


27.111 



^^^^^^1 


374 APPENDIX ^^^^H 


Sqdabss and StjnABB Roars— Continved M 


No, 


Square 


Root 


No. 


Square 


Square 
lloot 


No. 


Square 


Square 
Root 


TS6 


M 16 96 


7,129 


771 


59 44 41 


27.767 


B06 


64 96 36 


28.390 


787 


S4 31 6S 


T.US 


773 


59 59 84 


J7.785 


807 


65 12 49 


28.408 


788 


54 46 44 


>7.166 


773 


59 75 29 


27,803 


Boe 


65 28 64 


28.425 


TOT 


34 61 21 


J7-180 


774 


59 90 76 


27.821 


809 


65 44 81 


28.443 


TM 


54 7B 00 


>7.203 


776 


60 06 25 


27.839 


810 


65 61 00 


28.460 


741 


54 90 81 


37,221 


776 


60 21 76 


27.857 


BU 


65 77 21 


23.478 


742 


55 OS 64 


27.240 


777 


60 37 29 


27.875 


812 


65 93 44 


28.496 


748 


55 20 49 


27.258 


778 


60 52 84 




813 


66 09 69 


28,513 


744 


55 35 3G 


27.276 


779 


60 68 41 


27;9I1 


814 


66 25 96 


28.531 


74B 


55 50 25 


27.295 


T80 


60 84 00 


27.928 


B16 


66 42 25 


28.548 


746 


55 65 16 


27-313 


781 


60 99 61 


27.946 


B16 


66 58 56 


28.666 \\ 


747 


55 80 09 


27 331 


782 


61 15 24 


27.964 


817 


66 74 89 


28.583 


748 


55 95 04 


27.350 


783 


61 30 89 


27,983 


818 


66 gi 24 


28.601 


749 


56 10 01 


27,368 


784 


61 46 56 


28,000 


819 


67 07 61 


28.618 


7eo 


56 25 00 


27.386 


785 


61 62 25 


28,013 


B20 


67 24 00 


28.636 


7B1 


56 40 01 


27.404 


78S 


61 77 96 


28.036 


B3I 


67 40 41 


28.653 


762 


56 55 04 


27.423 


787 


61 93 69 


28.064 


B22 


67 36 84 


28,671 


763 


56 70 09 


27.441 


788 


62 09 44 


28.071 


823 


67 73 29 


28.688 


764 


36 85 16 


27.459 


789 


62 25 21 


28-089 


824 


67 89 76 


28.705 


786 


57 00 25 


37.477 


790 


62 41 00 


28.107 


826 


68 06 25 


28.723 


766 


57 15 36 


27.495 


791 


62 56 81 


28.125 


S26 


68 22 76 


28.740 


767 


57 30 49 


27.514 


792 


63 72 64 


28.143 


327 


68 39 29 


28.758 


768 


57 45 64 


27.532 


793 


62 88 49 


28,160 


628 


6S 55 84 


28.775 


769 


57 60 81 


27.550 


794 


63 04 36 


28,178 


829 


68 72 41 


28.792 


760 


37 76 00 


27.568 


796 


63 20 35 


28.196 


830 


68 89 00 


28.810 


761 


57 91 21 


27 586 


786 


63 36 le 


28-213 


881 


69 05 61 


28.837 


762 


58 06 44 


27.604 


797 


63 32 OS 


28.231 


832 


69 22 24 


28.844 


763 


58 21 69 


27-622 


798 


63 6S04 


28.249 


833 


69 38 89 


28,862 


764 


58 36 96 


27.641 


799 


63 84 


28.267 


834 


69 55 56 


28.879 


766 


58 52 25 


27,659 


:eoo 


64 00 OC 


28,284 


836 


69 72 25 


28.896 


766 


58 67 56 


27.677 


601 


64 16 


28.302 


B36 


69 88 96 


28,914 


767 


58 82 89 


27.695 


803 


64 32 W 


28.320 


837 


70 05 69 


28.931 


768 


58 98 24 


27.713 


803 


64 48 0g 


28.337 


838 


70 22 44 


28.948 


769 


69 13 61 


27.731 


804 


64 64 le 


28.355 


839 


70 39 21 


28.965 


7T0 


59 29 00 


27.749 


806 


64 80 2. 


28.373 


840 


70 56 00128.983 ■ 


^^ _^^ 



f n 


SQUARES AND SQUARE ROOTS 375 1 


SguABEB AND Sqoabb RooTB—Contintied ] 


No. 


Square 


Square 
Root 


No, 


Square 


Square 
Root 


No. 


Square 


Square 
Root 


B41 


70 72 81 


29.000 


876 


76 73 76 


29,597 


911 


82 99 21 


30.183 


B43 


70 89 64 


29.017 


877 


76 91 29 


29,614 


912 


83 17 44 


30.199 


813 


71 06 49 


29.034 


878 


77 08 84 


29.631 


913 


83 35 69 


30.216 


844 


71 23 36 


29.052 


879 


77 26 41 


29.648 


914 


83 53 96 


30.232 


84S 


71 40 2S 


29.069 


BSO 


77 44 00 


29.665 


916 


83 72 25 


30.249 


' 846 


71 57 16 


29.086 


881 


77 61 61 


29.682 


916 


83 90 56 


30.265 


847 


71 74 1)9 


29.103 


882 


77 79 24 


29-698 


917 


84 08 89 


30,282 


B48 


71 91 04 


29,120 


883 


77 96 89 


29.715 


B18 


84 27 24 


30.299 


849 


72 08 01 


29.138 


884 


78 14 36 


29,732 


919 


84 45 61 


30.315 


660 


72 25 00 


29,155 


886 


78 32 25 


29.749 


920 


84 64 00 


30,332 


8S1 


72 42 01 


29.172 


886 


78 49 96 


29.766 


gsi 


84 82 41 


30.348 


869 


72 59 0* 


29.189 


887 


78 67 69 


29.783 


922 


85 00 84 


30.364 


863 


72 76 09 


29.206 


888 


78 85 44 


29.799 


923 


85 19 29 


30.381 


864 


72 93 16 


29,223 


869 


79 03 21 


29.816 


924 


85 37 76 


30.397 


866 


73 10 25 


29.240 


890 


79 21 00 


29.833 


B26 


85 56 25 


30.414 


866 


73 27 36 


29.257 


891 


79 38 81 


29.850 


926 


85 74 76 


30.430 


8G7 


73 44 49 


29.275 


B92 


79 56 64 


29.866 


927 


85 93 29 


30.447 


868 


73 61 64 


29.292 


893 


79 74 49 


29,883 


928 


86 11 84 


30.463 


869 


73 78 81 


29.309 


894 


79 92 36 


29.900 


929 


86 30 41 


30.480 


860 


73 96 00 


29,326 


B96 


80 10 25 


29-916 


930 


86 49 00 


30.496 


861 


74 13 21 


29.343 


896 


80 28 16 


29.933 


931 


86 67 61 


30.512 


8G2 


74 30 44 


29.360 


897 


80 46 09 


29.950 


932 


86 86 24 


30.529 


863 


74 47 69 


29,377 


898 


80 64 04 


29,967 


933 


87 04 89 


30.545 


864 


74 64 96 


29.394 


899 


80 82 0] 


29.983 


934 


87 23 56 


30.661 


866 


74 82 25 


29.411 


900 


81 00 00 


30.000 


936 


87 42 25 


30.578 


866 


74 99 56 


29.428 


901 


81 18 01 


30.017 


936 


87 60 96 


30.594 


867 


75 16 89 


29.445 


902 


81 36 04 


30.033 


937 


87 79 69 


30,610 


868 


75 34 24 


29.462 


903 


81 54 09 


30,050 


938 


87 98 44 


30.627 


869 


75 51 61 


29.479 


904 


81 72 16 


30.067 


939 


S8 17 21 


30.643 


870 


75 69 00 


29.496 


906 


81 90 25 


30.083 


940 


88 36 00 


30.659 


871 


75 86 41 


29,513 


906 


82 08 36 


30-100 


941 


88 54 81 


30.676 


872 


76 03 84 


29,530 


907 


82 26 49 


30.116 


942 


88 73 64 


30.692 


873 


76 21 29 


29.547 


908 


82 44 54 


30.133 


943 


88 92 49 


30.708 


874 


76 38 76 


29.563 


909 


82 62 81 


30.150 


944 


89 11 36 


30.725 


878 


76 56 25 


29,580 


010 


82 81 00 


30.166 


946 


89 30 25 


30,741 


HH^^^^ 



r 



I 



APPENDIX 
SgcxUB AXD Stgo-tmrn Hotw— C a fiwri 



No. 


&?.*« 


^KT 


No. 


Sqam 


^S- 


MS 


89 


« 


16 


30 757 


976 


95 


25 


78 


31.241 


MT 


M 


68 


09 


30773 


9TT 


95 


45 


29 


31.Ki7 


MS 


80 


87 


04 


30 790 


978 


95 


64 


Si 


31.273 


Mt 


90 


OS 


01 


30.806 


979 


95 


U 


41 


31.289 


900 


90 


25 


00 


30.822 


980 


96 


Ol 


00 


31 30S 


Ml 


90 


44 


01 


30.838 


981 


96 


23 


61 


31 321 


M2 


90 


63 


04 


30-SS* 


983 


96 


43 


24 


31 337 


M8 


90 


82 


09 


30 871 


963 


96 


62 


89 


31.353 


9H 


91 


01 


16 


30.887 


984 


96 


82 


56 


31.369 


966 


91 


20 


25 


30.903 


966 


97 


02 


25 


31.3S5 


966 


91 


39 


36 


30.919 


966 


97 


21 


96 


31-401 


9ff7 


91 


58 


49 


30.935 


987 


97 


41 


69 


3K417 


966 


91 


77 


64 


30-952 


988 


97 


61 


44 


31.433 


969 


91 


96 


81 


30.968 


989 


97 


81 


21 


31.448 


960 


92 


16 


OO 


30.964 


990 


98 


01 


00 


31 464 


961 


92 


35 


21 


31.000 


991 


9S 


20 


81 


31.480 


968 


92 


64 


44 


31.016 


993 


98 


40 


W 


31 496 


96S 


92 


73 


69 


31-033 


993 


98 


60 


49 


31 512 


96i 


92 


92 


96 


31.048 


994 


98 


80 


36 


31,528 


965 


93 


12 


25 


31.064 


996 


99 


00 


25 


31.544 


966 


93 


31 


56 


31.081 


996 


99 


20 


16 


31.S59 


96T 


93 


50 


89 


31.097 


997 


99 


40 


09 


31,575 


968 


93 


70 


24 


31.113 


998 


99 


60 


04 


31.591 


969 


93 


89 


61 


31.129 


999 




80 


01 


31.607 


970 


94 


09 


00 


31.145 


1000 


100 


00 000 


31.623 


971 


M 


28 


41 


31 . 161 












973 


94 


47 


84 


31.177 












973 


94 


67 


29 


31,193 












974 


94 


86 


76 


31.209 












976 


» 


OS 


25 


31.225 














Accuracy, standard of, 270. 

Aims of education, statement of 
and adherence to, 10; Bob- 
bitt'a Etatemeat of, 10; gen- 
eral, 1 1 ; value of general 
limited, ll; must be definite, 
13. 

American people, a practical folk, 
6; have faith in echoola, S'2; 
support schools lavishly, 52. 

Arithmetic mean, 279; weighted, 
281; computation of by short 
method, 2S3. 

" At age " and normal child, 90. 

Averages, 279. 

Ayrea, criticizes Binet testa, 84-85, 
93-94; handwriting Hcale, 181; 
apelling scale, 255; nieaaure- 
ment retardation, acceleration, 
and elimination, 254-257; coni- 
putation coefficient of correla- 
tion, 337-344. 



Bibliography. See end of each 

chapter. 
Binet, conception of intelligence, 



between individuals one of kind, 
82; hia later conception, 83. 
Biiiet-Simon tests, 67-68; synopsis 
of, 68-70; innovationa^ 72: 
brought constituent functions of 
intelugence iuto play, 72; kinds 
of mental functions brought into 
play, 74; provisional scale, 
1905,75; problems to be solved, 
75-76; Avres criticizes, 84-85; 
age at wnich teata should be 
assigned, 86-87; problems in 



scoring, 87-88; all - or - none 
method, 88; point scale for 
mea;siiring, S8 ; number of teats 
to be pH^ed each age- level, 88; 
with what teats shall exaroina- 
tion be^in, 89; give more than 
composite picture, 90; criti- 
cisms of, 93; do not determine 
limits of trait, 94-95; some fav- 
orable criticism of, 99; revisions 
and estenaion of, 104-122; 
Stanford revision, 104-107, 

Black, W. W., 36. 

Bobbitt, J, F., 10- 

Bowley, on correlation, 325. 

Bridges, point scale, 111-117. 

Bu^ess, meoeurement ailent read- 
ing, 172, 

Bureau of Standards, Washing- 
ton, D. C, 5, 40. 

Burt, conception of general intel- 
ligence, 65; measurement of 
tacfcward children, 113. 

Business, compaxed with schools, 
42; variables in, 42. 



Cattell, eatablishes psychological 
laboratory, 60. 

Central tendency, measurement 
of, Ch.X, 278-303. 

Class intervals, 274. 

Classification of pupils by edu- 
cational tests, 190-192. 

Confidence of pubUc, must cul- 
tivate, 49. 

Correlation, measurement of, Ch. 
XTI, 324-360; need tor, 324; 
illustrating computation ol, data 
ungrouped, 328; data grouped 
and compleis, 331-337; perfect 
positive, 331 ; computing by 
Ayres' short method, 337-344-. 



r 



Creeds, educational, 12-13; 

eotiBt and, 13. 
Culture, empho 

mind, 26; development of, 26, 
Curve plotting, 273. 



thinga of 



Danger to be avoided, 12. 
Data, on school-room problems 
lacking, 25; rules for tacmlaticg, 



132. 

Defective child, compared with 
normal, 100-102. 

De Sanctis tests for nient&l defec- 
tion, 121. 

Deviation, computation of mean, 
308; with grouped data, 310; 
general formula for computing, 
313; standard computation oF, 
315; meaD and standard com~ 
pared, 317-321, 

Di^^nostic tests. See Tests, 

Diatribution, of subnormal indi- 
viduals, 78; table of, 275; 
analysis of, 275, 

Downey, on will profile, 142. 



Ebbinghaus' conceptifin of intel- 
ligence, 64. 

Economy, affected through small 
savings, 16; applied to indi- 
viduals, 18; in education, 18- 
10; through measurements, Ch. 
II, 10-55. 

Educative situation, 8. 

Educational age, 192; compared 
to mental age, 192. 

Educational measurements, has 
Bcoffera and zealots, 43; can- 
not plot curve of genius, 43, 

Educational quotient (K.Q,}, 192, 
194. 

Efficiency, demand for, 1; in the 
Gary schools, 2; teiachers', 2; 
booKB on, 2-^; how mechanic 
determines, 3; of school system , 



3; American eitizen, 3-4; meas- 
ured in higher education 6-7; 
working hypothesis for acquir- 
ing, 8, 

Energy must be conserved, 15. 

Equation of straight line of re- 
greasioD, 351 . 

EixoFj question of, 262; compen- 
satmg vs. cumulative, 270. 

Examinations, time consumed in 
giving, 147; attitude of teacben 
and pupils toward, 148. 

Experience, profiting by that of 
others, 39; in business, 3^-40; 
in medicine, 40; in manufac- 
turing stoves, 41. 



Facts, inability to show works 
hardships, 32; business man 
and, 33; "show up" scbool 
by, 34- 

Factual basis for education, 20. 

Fechner, 57. 

Fields of educational teets and 
measurements, 53-54. 

Form board for measuring inteHi- 
genco, 108. 

Freeman handwriting scale, for- 
mation of, 182. 

Genius, differs from intelhgence, 



card, 



Goddard, 100. 

Gray, handwriting 
182-184. 

Grwory, study of reading vocab- 
ularieB, 237; scoring American 
histories, 240-247. 

Gregory-Spencer geography teete, 
214; divisions of, 216; selec- 
tion of cities in, 217; advan- 
tages of tests tbus|design^, 221 ; 
determination of scores in, 732; 
objectionB, 224; eSecta of in- 
correct statements, 225. 

Group intelligence tests, 122-138; 
principles involved, 123; re- 
quirements for, 124; Tennan, 
125; National, 127; HaMerty, 
J29; Otis, 131; Dearborn, Iffi 
number given, l'~ 



Eabermann, definition of nomial 
individual, 78. 

Baggerty, intelligence exaniiiia- 
tions, 129; summarizes work of 
measurements, 140; aa factors 
which condition succesa, 140, 

Hall, establishes psychological 
laboratory, 60. 

Hardwiek, point scale, 111-117. 

Healy, picture completion teat, 
lOS; criticizes mental testing, 
143. 

Human energy must be conserved, 
15. 

Hume's description of meta- 
physical sciences, 20-21. 



Inductive method, not used in 
pedagogy, 24. 

Intelligence; measurement of, 
Chs. IlI-IV, 56-144; some 
major problems of, 56; what 
it is, 8^-65; definitions of, 63; 
differs from memory and genius, 
63-67; Zeiien attempts to 
measure, 65; as general faculty 
of the mind, 65; Burt's inveati- 
gations of, 65; general faculty 
of, 65-67; inability to define 
does not prohibit measure- 
ments, 67; maturity of, 70; 
depends on correjatioii and in- 
tmunctioning, 73; differences 
in degree and kind, 82 ; choosing 
t«stB to measure, S3; tests must 
not be influenced by, 84; tests 
must have symptomatic value, 
85; tests must not depend on 
use ot language, 85-86; group 
tests of, 122-136; levels for 
various occupations, 132; sum- 
maiy and evaluation, 136-145; 
metnods crude, 137; sympo- 
sium on, 138; suggestions to 
teachers, 144. 

Intelligence quotient (I.Q.), 81, 
192. 

IJaedorholm, experiments on sub- 
normal children, S3. 
James, 38. 



Jolinaon, on grading high-school 
students, 163. 

Jones, studies on spelling vocab- 
ularies, 236. 



Kant, on psychology as a science, 

59. 
Kelly, P. J,, on teachers' marks, 

164. 
Keeping records of methods tried. 



49, 



Kuhlmann, on mental maturity, 



Limited quantities make 

ments necessary, 14. 
Lowell, on simplicity ot tests, 124. 



Marking system, now in vogue, 
150; inadequacy of, 156; in- 
efficient, 158. 

McCall, location of zero point, 
199; definition of median, 289. 

Mean, 279; computation of mean 
deviation, 308, 

Measurements, in physical sci- 
ences, 6; teachers should use, 9; 
suggestions to teachers tor, 144; 
of school achievements new, 
152; more exact make educa- 
tion a science, 164; by opinion, 
comparison, and standardized 
teats, 189; in other fields, Ch. 
VIII, 232-2.58; ot materials of 
instruction 233-249; ofspellinH 
vocabulary, 233; of physical 
growth ot children, 249-251; of 
school buildings, 251-253; of 
retardation, acceleration, and 
elimination, 253-258; educa- 
tional compared with other 
fields, 264; of central tendency, 
Ch, X, 273-303; ot variability, 
304-323. 

Measures, distribution of about 
central tendency, 262; undis- 
tributed, 271. 



380 in: 

Median, 2S7; definitions of, 287- 
2X9; fonnulei for findinE, 290, 
computation of ia simple die- 
tnbution, 291; in complex die- 
trhiition, 293; compared with 
mean and mode, 302. 

Medical criterion for separating 
subnormal from normal, 79-80. 

Mental age, coefficient of, 90-91; 
how to find, 106. 

Mental maturity, age of, 91-93; 

Meumaim, conception of intelli'- 
gence, 64; cnticjzes idea of 
general factor in intelligence, 
66; proposed reorgoni nation of 
Binet scale, 117. 

Mode. 286, 302. 



National intelligence testa, 127. 
Need for definite measurements, 

Ch. V, 147-179. 
Normal probability curve, 77. 
Normal frequency curve. See 

Normal probabiLtycurve. 
Normal individual, Habermann'B 

definition of, 78; criteria for 

separating from subnormal, 79. 
NoiBworthy, experiments on 

feeble-minded cnildren, S3. 



Objections to educational testa, 
43-48. 

Obstacles to ratjooal educational 
reform, 28. 

Opinions; kinds and uses, , 

when worthless, 22; philosophi- 
cal PS. knowable facts, 24-25; 
when no excuse for, 26. 

Otis group intelligence teats, 131. 

Paterson, scale of performance 
_ testa, 109-110. 

eTperimentfl on subnor- 

I children, 83; on correla- 

325-326; equation for 

" ■" es, 353. 

Pedo^^cal criterion for sepa- 

rotmg subnormal from normal, 




Perfonnonce testa, 8 
Percentiles, 303. 
Picture completion tests, 107-108. 
Pintner, scale of performance 

tests, 109-110. 
Point scale for measuring mental 

ability, 111-117; distribution of 

tests, 114r-115. 
Porteua, on mental maturity, 92; 

on limitations of Binet tests, 97, 
Pragmatism, 6. 
Probable error (P3£.), 306, 
Progress conditioned by abili^ 



Psychiatrist, work ofj 61-63. 

Psychobgical criteria for sepa- 
rating subnormal from normal, 
79-SO. 

Psychological measiirements re- 
tarded, 59; made slow progress 
from Aristotle, 61. 

Public asking for a ledger account, 
31; interested in education, 61. 

Pyle, criticizes Binet tests, 97. 



Ruge, definition of median, 287. 



Salt Lake City survey, 30. 

Scale of performance tests, 109. 

Scale, definition of, 159; ideal 
must have, 161; has been sub- 
jective, 167; some chaiw;teriB- 
tics of ideal, 196; zero point of, 
196; making steps eqimL 2(0- 
211; must measure desired 
product. 211. 

Scores, value to be assigned, 226; 
weighting, 2^; accomuutjcoi 



INDEX 



/ 



*381 



V 



and difficulty, 229; interval, 
289. 

Scoring, tests and treatment of 
measures, Ch. VII, 227; teach- 
ers' iuc^gment in, 228; Ameri- 
can histories, 240-249. 

School, purpose of, 18; is a state 
monopoly, 32; compared to 
business, 42; variables in, 42. 

Seashore criticizes Binet tests, 
96-97. 

Secrist, definition of median, 288. 

Series, discrete and continuous, 
271. 

Single variable, law of, 172; illus- 
tration of, 172-174. 

Social economic criteria for mental 
deficiency, 78-79. 

Spearman, conception of general 
intelligence, 65-66; mental ma- 
turity, 92; method of rank 
correlation, 362. 

Specifications, manufacturer has, 
41; schools should have, 42. 

SpeUing vocabulary, measurement 
of, 233; of Ayres' scale, 234. 

Standards, establishment of, 34- 
39; bricklayer has, 36; three 
kinds needed, 36; quantity, 36; 
time, 38; quaUty, 39; changing, 
39. 

Stanford Revision Binet Scale, 
104-107. 

State examination questions ex- 
amined, 216. 

Statistics, general statement of, 
Ch. IX, 260-277; uses in other 
fields, 261; definition of, 267; 
laws of statistical regularity, 
268; methods of, 269; limita- 
tions of, 270; understanding 
statistical formulas, 276. 

Stem, definition of intelligence, 
63; coefficient of mental age, 90. 

Strayer and Engelhardt, score 
card, 251-253. 

Superintendents' meeting at Indi- 
anapolis, 28. 

Supervision, improved by meas- 
urements, 165. 



T scale, 199-202. 



Talent, differs from intelligence, 
63. 

Teachers, influence measured, 7; 
must know why changes are 
made, 49-50; must take initia- 
tive, 153. 

Technical scientist and educa- 
tional creeds, 12. 

Terman, use of intelligence quo- 
tient, 91; on maturity of intel- 
ligence, 92; revision of Binet 
scale, 104^107; group tests, 
125; symposium on inteUigence, 
138; T scale, 201. 

Tests, educational curiosities, 9; 
limitations of, 91 ; use of intel- 
ligence, 132; army mental, 132; 
purposes of not understood, 154; 
what they measure, 155; do 
not indicate cause of conditions, 
168; how differ from other ex- 
aminations, 169; how made, 
169-172; when should be given, 
174; number of times, 176; 
how improve instruction, 177; 
kinds most important, 178; 
classifying and designing, 180- 
185; diagnostic vs. general, 180; 
degree to which diagnostic, 181 ; 
usSi in Cleveland survey, 185; 
formal and reasoning, 185; rate 
and development, 186; quan- 
tity, difficulty, and time, 187; 
subject-matter for, 194; deter- 
mined by information desired, 
196; simple in appUcation, 211; 
not require too much time, 212; 
scoring and treatment of meas- 
ures, Ch. VII, 214^-230; prob- 
lems in scoring, 214; making 
a geography test, 214; values 
to be assigned to scores, 226. 

Testing, proolems of, 100. 

Thinking, quantitatively, 5-6. 

Thomdike, 48; on general faculty 
of intelligence, 66; established 
zero in handwriting, 198; defi- 
nition of median, 288. 

Time, relation to school products, 
28. 

Titchener, 58; establishes psycho- 
loncal laboratory, 60. 

Tradition, strength of, 9. 



382 



INDEX 



Variability, measurement of, Ch. 
XI, 304-323; how measured, 
305: measures of absolute, 305; 
coefficient of, 321; Pearson 
formula for, 322; Thomdike 
formula for, 323. 

Vocabularies. 233; reading and 
spelling ol Oregon school chil- 
dren, 337. 



Wallin, criticizes Binet scale^ 118. 

Waste, teachers should elimmate, 
9; elimination of , 14; not pecu- 
liar to American schools, 14; 
opportunities for in education, 
15; how business man elimi- 
nates, 16; much and varied in 
education, 17. 



Watt, 4. 

Weber, 58; experimental work in 
psychology, 58^-59; laws, 58. 

Woody, ariwnetic scale, 195. 

Wooters, 21. 

Wundt, 57; establishes jwycho- 
logical laboratory, 57-68; ef- 
fects of laboratories, 60. 



Yerkes, point scale, 111-117. 



Zeilen, attempts to measure in- 
telligence, 65. 

Zero, point of scale, 161, 196; in 
composition, 197; in penman- 
ship, 198; McCall establishes, 
199; in T scale, 200. 



(1) 



Td anMd fine, this boob should be returned o 
! er befoic the date last stamped below 



^. 






^-7 
I 'To 



371.27 ,GB22 C.1 

Fundamsntals of educational me 

Stanlofd University Libraries 

lllllll>l1llll'HIII<lllll!lllllillt1l 



^11 -l-n, 



3 6105 033 373 270 



